The November 2022 Edition of The Resource
Hello Reader, here is this month’s iRODS news and developments!
If you’re facing an issue with iRODS you’re not sure how to solve, please do drop me a line; if I’ve come across a solution or seen something relevant elsewhere, I’ll do my best to let you know. Or just drop me a mail to say ‘Hi’. Always nice to hear from people, particularly in these pandemic times!
I’d love your thoughts and feedback on how this newsletter could be better for you.
News
November has been a quiet month!
Not for me - I’ve been working through the maze of dependencies, upgrading 200+ systems from 4.2.7 to 4.2.11. Not there yet - my dev systems are updated, but still have some issues. Hopefully I’ll be able to report successful production upgrades in the December newsletter. What have you been working on?
I’ve joined Mastodon, as many are at the moment, feel free to connect with me at @kript@mastodon.theultraworld.org. Not all my tweets are iRODS related, though!
Main Repository Activity
Open Issues
Incorrect use of gethostname() and HOST_NAME_MAX
iRODS misuses POSIX's gethostname() by passing a buffer of size HOST_NAME_MAX.
irods::server_properties::map() can result in a data race
Use of the map() member function (as seen in the snippet below) is very convenient as it grants access to the underlying nlohmann::json object directly. However, it can result in data races because it bypasses the synchronization mechanisms used internally by the irods::server_properties instance.
LDAP Integration Feature request
Would you like iRODS to integrate with iRODS? What should it do when it loses connectivity to its LDAP system? Join the discussion above and let the maintainers know, especially if this is something you would find beneficial!
irods-grid is sensitive to ordering of entries in /etc/hosts
FQDN goes first, which is a convention, but it really shouldn’t break things.
Document which configuration properties can be changed post setup
Not all config properties are allowed to be changed post setup. Therefore, docs.irods.org needs to list which properties are safe to change post setup.
I think this refers to the unattended installation setting, where you pass the setup script a JSON file. There is nothing stopping you doing this multiple times, however the changes are not always idempotent. See the next issue!
Document unattended installation’s overwrite behavior
Unattended installs will completely overwrite the contents of server_config.json and irods_environment.json. This can lead to a nonfunctional server if information shared between the config files and the database become out of sync.
docs.irods.org needs to include a few statements about this behavior.
Rename base64_encode and base64_decode
I have found when testing the Globus plugin (client to iRODS) that we appear to be linking to a version of base64_encode and base64_decode that is not the intended version. These function names are too generic and should be placed into a namespace or something.
Have resource server use its default_file_mode configuration value when creating local replicas
Enhancement request;
One use case of iRODS is to colocate a data consuming service on the resource server hosting the data for this service. The service needs file system level permission to access the data, which can be configured using default_file_mode. Currently, iRODS chooses the value of default_file_mode set on the iRODS server the uploading client connects to, ii.e., the client's irods_host configuration value. If the client connects to the zone's canonical iRODS host, which may be a load balancer, it is likely that the client won't connect to the colocated resource server, and this server's default_file_mode won't be chosen. To prevent this from happening, the default_file_mode on all of the iRODS servers needs to be set to the value required on the colocated one. This isn't intuitive, and it's not always desirable.
Could iRODS be changed so that the selected default_file_mode come from the resource server hosting the storage resource chosen for a new replica? Or maybe the unixfilesystem resource could be modified to accept file mode as a context value.
msiSetDefaultResc / acSetRescSchemeForCreate no longer force Resource write when incorrect resource given
This is working as designed.
This bugfix was part of #4084 for 4.2.9.
Diff for the docs: irods/irods_docs@d2631f0
Note the difference in the last row of the tables in 4.2.7:
https://docs.irods.org/4.2.7/system_overview/configuration/#default-resource-configuration
vs 4.2.11
https://docs.irods.org/4.2.11/system_overview/configuration/#default-resource-configuration
irods::client_api_allowlist::enforce is marked noexcept, but can throw exceptions
Document when the server requires a restart in regard to SSL configuration changes
FTAOD restart the server when you do this until a more canonical answer emerges. I can verify that iRODS will continue to read the old one unless you do, at least for some of the processes - enough to basically stop it working. This is puzzling because stracing the server when it starts a new rodsAgent process shows it reading the cert file.
test_ifsck__2650 test failure
Failure between 4.3.0 and 4.3.1, so current 4-3-stable
Refactor user administration API to throw exceptions instead of return error codes
Re-enable test_auth
Add remove_if_exists(file) and make_arbitrary_file to lib.py
These are in the s3 plugin (s3plugin_lib.py). They have been requested to be added to lib.py.
irodsDelayServer does not start because FQDN is configured as delay_server leader
This seems to be caused by the fact that the initial delay_server leader is set to the FQDN of my iRODS host, while the code uses the (short) hostname for comparison.
Error when calling msiDataObjChksum from acPostProcForFilePathReg
Initial report;
We had this configuration in iRODS 4.2.8, and we never got this error. I think because acPostProcForFilePathReg was not triggered when an object was put. This seems to have changed somewhere between 4.2.8 and 4.3.0.
We used this rule to create a checksum for objects that are registered using ireg.
and response;
Yes, in 4.2.9, with the work done to unify many of the codepaths and provide logical locking - the registration for both 'large' AND 'small' files now happens prior to data being written to disk - as a placeholder for the locking to have a thing to hold.
In 4.2.8 and before, registration-before-data-on-disk would have only happened for 'large' files that triggered parallel transfer (default, >32MB).
Please try pep_api_phy_path_reg_post() instead... I believe this will not fire for an iput, but will fire for ireg.
cross-zone connections between irods servers in non-federated setting
For easy reproduction, the following should trigger the bug:
Run an irods server version 4.2.11
Run iinit from another host with icommands version 4.2.10 or 4.2.11 and a readable /etc/irods/server_config.json containing only:
{
"negotiation_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"zone_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
Document mapping between client-side APIs and dynamic PEPs
Allow connection_pool and client_connection to refresh connections after N API requests
#6593 means we need to consider a few things regarding long running agents:
How do we handle memory leaks?
How do we handle admins modifying policy, primarily for the iRODS rule language?
One way to get around this is to introduce a counter that is associated with each connection. This counter will represent the number of requests processed by the agent. Once the counter exceeds N requests, the connection is replaced by a fresh connection.
Replacing the connection enables the following:
The agent servicing the requests is shut down
Memory leaked by the agent is returned to the OS
The new agent sees the updated policy
iadmin mkzone should report an error when given invalid connection information
rstrcpy needs to log source string on error
Remove responsibility of freeing heap allocated memory from packstruct on API response
msiExecCmd_bin directory is owned by root:root after force-reinstalling packages via rpm
Deadlock in MySQL database plugin on many concurrent inserts
The mysql function R_ObjectId_nextval() is not safe for parallel database updates, that some of our users can trigger by opening a large number of parallel connections.
iunreg should instruct unlink API to skip vault check
This message was added to leave signal that an attempt was made to unlink a file not found in an iRODS vault, leading to potential data loss. However, iunreg is not attempting to unlink the file; so, the log message is superfluous. A keyword called RESOURCE_SKIP_VAULT_PATH_CHECK_ON_UNLINK was provided to instruct the API to skip the vault check, and iunreg should provide this in its call to rcDataObjUnlink.
document iquest attrs
Improper delay execution frequency time is Ignored or Misinterpreted
Worth noting if you use the Delay rules with the Execution Frequency ()
Update documentation for ibun/msiTarFileExtract()
Closed Issues
Closed on - 2022-11-11 16:23:18 icommands should compile against the same C++ standard as the server
Closed on - 2022-11-04 20:32:40 GitHub actions for clang-format and clang-tidy no longer work due to Ubuntu 22.04
Closed on - 2022-11-04 20:32:30 User administration C++ library cannot query info about remote users
Closed on - 2022-11-07 20:42:29 Remove rule_texts_for_tests.py
Closed on - 2022-10-26 22:57:30 Administration libraries do not pass down include dirs properly
Closed on - 2022-11-07 19:41:55 Document maximum_size_of_delay_queue_in_bytes
maximum_size_of_delay_queue_in_bytes` (optional) (default 0) - The maximum number of bytes available to the delay queue. When set to 0, the delay server will use as much memory as it needs to hold queued rules.
Closed on - 2022-11-04 20:32:19 Don’t allow clang-format to format api_plugin_number_data.h
Closed on - 2022-10-25 21:02:51 Clang-Tidy GitHub workflow cannot find catch2 headers
Closed on - 2022-10-25 21:02:38 Memory leak in PackStruct unit test
Closed on - 2022-10-19 19:13:47 Clang-Format: Disable Preprocessor formatting
Closed on - 2022-11-09 21:42:08 Remove log_facility property from log message output
Closed on - 2022-11-09 21:42:16 Add zone name to log message output
Closed on - 2022-11-08 19:29:57 Refactor client_api_allowlist interface to match style of replica_access_table, etc
Closed on - 2022-11-04 16:13:09 Deprecate SimpleQuery
Closed on - 2022-10-20 21:10:43 iadmin fails to list user in case of particular username length combinations
Targetted at 4.2.12
I have confirmed that replacing the SimpleQuery implementation with a GenQuery implementation resolves this issue. I am going to try to replace the other SimpleQuery uses in iadmin as part of this effort so that we can eventually remove it as this seems to be the last remaining holdout.
Closed on - 2022-11-07 20:45:23 closeAllL1desc should not call PEPs
Closed on - 2022-11-09 21:41:44 Refactor resource administration API to throw exceptions instead of return error codes
Closed on - 2022-10-25 21:02:15 Add C++ library for managing zones
targetted at 4.3.1
The library should be modeled after the user/group administration library with the goal of providing a modern interface and simplifying usage of the zone management features provided by the iRODS C API function, rxGeneralAdmin.
Closed on - 2022-10-25 21:02:02 Expose utility functions used by the User Administration library
Closed on - 2022-11-04 20:33:25 Allow identity of user attached to connection/agent to change in real time
iRODS connections tie the identity of a user to the socket. This is fine for one-off commands, but not for situations where there can be hundreds to thousands of concurrent users. Creating a new connection for every user will quickly drain resources.
Therefore, iRODS should provide a way to change the user identity tied to the connection object. This would lead to huge improvements regarding performance, scalability, and resource management.
This would also improve support for client applications because the client libraries would finally be able to implement real connection pooling for iRODS connections.
Closed on - 2022-10-27 00:21:34 JSON Schema validation paths are incorrect for non-package installs
Closed on - 2022-10-25 21:02:25 Consider adding feature test macros
Closed on - 2022-10-20 15:01:47 ichmod should not be allowed to bypass the permission model
ichmod is currently allowed to bypass the permission model when the user adjusting the permissions matches the original owner.
Only users with own permissions or an admin should be allowed to restore alice's permissions.
Closed on - 2022-11-04 19:17:26 Refactor / Modernize main server logic (rodsServer.cpp, etc.)
Closed on - 2022-11-09 21:15:03 Delay server should not log a stack trace when default config value is used
Closed on - 2022-10-21 20:38:10 Delay server adds completed rules back to queue, race condition, then complains loudly
Closed on - 2022-11-08 21:14:33 server-side irods_environment.json doc and validation schema bugs
Closed on - 2022-11-07 19:41:45 Client API Allowlist option does not align with description at docs.irods.org
Closed on - 2022-11-09 21:15:27 add delay server memory usage default to server_config.json on upgrade
Closed on - 2022-11-08 04:03:17 Non-admins should not be allowed to run iadmin lg
Closed on - 2022-11-07 15:55:42 Atomic metadata update api lookup fails
Closed on - 2022-11-04 20:03:24 provide client connection information to acPreConnect
User requested to be able to determine whether to use SSL or not based on the incoming network connection (internal network or external). Original thread at https://groups.google.com/g/irod-chat/c/3afhUiB2A0k.
This would be done within acPreConnect(), however, there is no connection information available to that PEP to make an internal/external determination. It is handed a manufactured, empty rei.
Closed on - 2022-10-24 14:54:57 Fix delay hints parser for “DOUBLE” directives
Closed on - 2022-10-24 15:32:56 Add detached mode to unixfilesystem plugin
Very interested in this one;
Add detached mode to UFS.
This will be similar to what is done for the cacheless S3 plugin. Any resource server can serve up the request.
Python iRODS Client Activity
Open Issues
Expose errno code in string representation of an irods.exception
It was noted in this previous issue and comment that more information could be given in the product of repr(e), with e being an irods.exception returned to the PRC client from the iRODS server. Specifically the errno code provided by the OS (and propagated back to the client was e.code is 28 -> ENOSPC -> 'No space left on device'. That is essential information which rightfully should be expected as part of the repr(e) output.
Closed Issues
Closed on - 2022-11-13 16:04:45 Open socket connections can still cause log noise when gc-collected in Python
Closed on - 2022-11-13 16:05:11 Generate SSL context from iRODS settings
In making an SSL connection more naturally following the configured irods_ssl_* settings allow the SSLContext to be automatically generated by the client library instead of relying on the user to provide a default-generated context
Closed on - 2022-11-10 05:21:23 Fix password_obfuscation in Windows
The Python os library module in Windows does not implement the getuid() function. As a result, attempting to encode()/decode() a password with the password_obfuscation module results in an AttributeError. The getlogin() function, however, is implemented for Windows. Provided the login can be assumed to be as unique as a uid, this can be used as replacement salt for the process.
Closed on - 2022-10-18 17:41:59 Rule execution with a file with null input throws an error -1201000
Closed on - 2022-10-22 17:23:28 Large put() over federation leaves “valid” replicas of incorrect size and checksum when interrupted
action is to move to 4.2.12 / 4.3.0! There are notes about various iRODS versions in the README already and it would be helpful to have an additional note to the effect that PRC put != iput and the consequences of how that interacts with different iRODS versions.
Closed on - 2022-11-12 13:59:27 irods.exception.SYS_NO_API_PRIV when groupadmin creates group
Closed on - 2022-10-18 15:08:34 Rule execution should allow the ‘null’ input parameter for the rule_file (.r)
YODA Activity
Open Issues
[FEATURE] statistics overview split between research, vault, and total used.
The statistics module shows total used storage by category and group. But no distinction is possible between research area, vault area, and total used storage.
[FEATURE] support for subdomain in ansible parameter external_users_domain_filter and oidc_domains
The configuration for OIDC domains works only for the root domain, for example surf.nl.
If the organization has users with emails with subdomains, they are not matched by the current Yoda rules.
For example mydepartment.surf.nl is not matched if the oidc_domains list contain surf.nl.
[BUG] changing the subcategory in group properties for a datamanager group does not work
Closed Issues
Closed on - 2022-10-28 10:31:35 [FEATURE] Data Access Password expiration notification
When a Data Access Password has expired a WebDAV client will throw an authentication error, this might lead to unneeded support calls when researchers forget they have to set a new password themselves.
Describe the solution you'd like
Send an automatic notification to the user X hours (configurable for the instance) before a Data Access Password expires. The existing notification functionality in 1.8 seems suitable for this.
Will be released with v1.9.0.
Closed on - 2022-10-25 10:05:14 [FEATURE] DOI versioning
DOI versioning - quite a detailed request.
Support for DOI versions is added in UtrechtUniversity/yoda-ruleset@2b15ff9
Will be released with v1.9.0.
If you think someone else would appreciate this newsletter, they can sign up at https://theresource.metadata.school/
No Yaks were shaved in the making of this newsletter. It had to happen some time…