August 2021 Edition of The Resource
Hello Reader, here is this month’s iRODS news and developments!
If you’re facing an issue with iRODS you’re not sure how to solve, please do drop me a line; if I’ve come across a solution or seen something relevant elsewhere, I’ll do my best to let you know. Or just drop me a mail to say ‘Hi’. Always lovely to hear from people, particularly in these pandemic times!
I’d love your thoughts and feedback on how this newsletter could be better for you.
News
Anniversary!
A whole year of doing this newsletter - wow!
Although I intend it to be at the middle of the month, it often creeps towards the end, I find…
TRiRODS - August 2021
Delay Server Availability and Scalability in C++ starring Violet White, iRODS Consortium (intern)
I admit I haven’t watched this yet, but I am impressed at going from not having seen the codebase to implementing things in the delay server in such a short time.
Main Repository Activity
Open Issues
-fpermissive required to build with GCC
If you care about building iRODS from scratch, the following few issues show the work that ‘SwooshyCueb’ has been putting into this refactor.
[gcc] -Wformat-overflow warning in nre.reHelpers1.cpp
[gcc] -Wformat-zero-length warning in db_plugin.cpp
[gcc] -Wtype-limits warning in voting.cpp
[gcc] -Wunused-but-set-variable warnings in irods repo
[gcc] -Wsizeof-pointer-memaccess warning in test_packstruct.cpp
[gcc] -Wmissing-field-initializers warning in test_rerror_stack.cpp
[gcc] -Wcatch-value warnings in irods repo
[gcc] -Wsequence-point warning in connection_pool.cpp
[gcc] -Wformat-truncation warnings in irods repo
[gcc] -Wignored-qualifiers warnings in irods repo
[gcc] -Wstringop-truncation warnings in filesystem.cpp
[gcc] -Wunused-variable warnings in irods repo
[gcc] -Wimplicit-fallthrough warnings in irods repo
[gcc] -Wclass-memaccess warnings in irods repo
msiGetStderrInExecCmdOut segfaults when using failed *out as input
Something to be aware of if your rule base uses msiExecCmd and you are checking the exit status.
Delayed rule execution status no longer recorded in ICAT
"In <= 4.2.4, you can use iquest '%s' "select count(RULE_EXEC_ID) where RULE_EXEC_STATUS = 'RE_RUNNING'" to count the number of delayed rules currently being executed, but after a refactor, this was no longer possible, so another way is being looked into to show the in-memory queue."
I admit I didn’t know this was a thing! I should probably spend more time in the database…
cyberduck and irods file transfer
Despite the issue title, this may be memory usage of the Audit plugin, so if you are using that or planning to, it’s worth keeping an eye on this issue - or chiming in with your thoughts/expectations.
irodsReServer segfaulting while freeing unpacked REI
If you write delay rules, I think it’s worth reading through and keeping an eye on this issue.
verbose reporting of src & dest iRES, and other hosts involved
"When transferring files, particularly with irepl, sometimes we’re having performance problems and would like to know which hosts are involved. The current batch of problems are hardware/firmware somewhere between the two RAID controllers involved.
Please can we have a level of verbosity, by -v -v -v or otherwise, which gives this kind of information?"
Verbose reporting of this kind could be implemented with the Rule Engine, but it could slow things down a lot, as well as requiring an aggregate view of all the server logs (which you have already, right? Right?), and potentially some way to tie together the processes - always tricky across servers, and more so across Federated Zones.
Investigate checking path permissions when registering an intermediate replica
"There is a check for whether or not to perform a path permission check when registering an intermediate replica. The permissions check is always bypassed when checking while REGISTER_AS_INTERMEDIATE_KW is set. This seems wrong and should be investigated. The resolution may just be a better explanatory comment, but ideally the permissions check should be following the configured policy on the server."
itouch fails to create data object when connected to catalog consumer
The real oddity to this issue is that it occurs on a consumer, but not, seemingly, on a provider.
univMSSInterface.sh missing after upgrade of irods 4.1.x to 4.2.8
TL;DR if you can, upgrade straight to 4.2.10 and miss out anything earlier if you use /var/lib/irods/iRODS/server/bin/cmd on 4.1.x or /var/lib/irods/log/msiExecCmd_bin in 4.2.x
iput stuck (hanging), zero bytes transferred
Still under investigation - could be file system issues, either at the server end or local (in this case, Lustre).
rodsuser no longer has permission to GET_HOST_FOR_PUT_AN or GET_HOST_FOR_GET_AN
For those coding against the API (C in this case), it’s worth testing your code against 4.2.10, as This behaviour has changed since 4.2.7.
Object locking/state issues when iput/irm clients contend heavily for one file
Sure, many clients shouldn’t be trying to create or delete the same file, but if you expose your iRODS to cluster users, you can bet someone will do this at some time, intentionally or otherwise! It’s a tricky issue to solve in a distributed system.
iquest cannot match names containing apostrophes via = operator if it’s not the last condition in the where clause
release activities for 4.2.11
Steady, not yet - just a placeholder!
upgrade 4.1 to 4.2.8 on catalog consumer fails to migrate command scripts
Quite an odd one this! but of interest to you, perhaps, if you make use of /var/lib/irods/msiExecCmd_bin
"I observed that /var/lib/irods/msiExecCmd_bin - the directory itself, only - changes permissions from irods:irods to root:root on upgrade from 4.2.9 to 4.2.10+ on the centos:7 docker image… only if the iRODS server is a catalog service consumer (CSC)."
Allow migration of the delay server
Quite a good thread about what’s needed to move the Delay server between servers, with the latest update being this rather excellent summary (IMO);
"We’re not designing a distributed data store… and don’t need the availability of the delay_server.
I think this is the set of design goals…
we're not a fully distributed system, and don't want to become one(?) clear point of control / truth easy to reason about (and therefore... fix) if/when it goes sideways the delay server itself is not required to be immediately/always available/active
With the leader/successor protocol above, we would be prioritizing:
the single source of truth (probably in the r_grid_configuration table) the single point of control (admin set_delay_server ) no special hats to wear, all servers running the same code (already true) arriving in a steady-state relatively quickly the "quickly" knob here would be the "utility function" which determines when the successor declares the leader dead and then promotes itself to leader >(for example, it could be 3 consecutive failed pings over 5 minutes, or anything else that is deemed a good signal of the server being 'gone').
For your scenario of detecting when to run iadmin set_delay_server ... I think that's out of scope of this protocol itself. We are looking to add some >basic monitoring to the iRODS grid (so the ZMT can draw some green circles or something), but we should have a design session around those goals as well - we >don't want to start logging too much timeseries data and/or providing graphs or other things that existing monitoring/dashboarding tools do very well."
imeta qu numeric comparison throws sql error
we’re considering providing an additional more human-friendly assessment of the reason for an SQL error being returned. I’ll reopen so we don’t lose track of this.
Postgres Version Compatibility
Updated with;
"we know postgres12 is now a problem with the current unixodbc (#5325)"
and also
"We are migrating to 11.3, so far so good."
Crash over long collection name
Was it fixed in 4.2.8? Perhaps earlier?
GCC compatibility
Loads of background changes to GCC compatibility here. Also interesting that the developer is using Ceph as a back end - I would like to find out more about that since the LibRADOS plugin isn’t officially released yet…
consolidate all configuration json into server_config.json
Should make 4.3 clearer to manage, especially for new people.
NOT IN syntax not supported in GenQuery
See below!
‘in’ in IN() is INVALID
Also, see below!
"This will be fixed by #3902"
genQuery uses ordered strstr() to find where condition keywords
"A result of this is that when handling the clause META_COLL_ATTR_NAME IN (‘originalVersionId’)”, genQuery will find the in in originalVersionId, note that it is not at the beginning of the condition (cp == condStart), then move on.
A workaround for this case is to always use the lower-case in keyword."
The issue has been open for a bit, but the baton was passed from Jason Capolsky to Kory, making him the third developer to be so honoured!
"This will be fixed by #3902"
move/copy of file between federated zones should copy metadata values
New discussion on this issue after a while. In the meantime, check out baton!
Bugzilla 168 - ichmod -r doesn’t set ACLs on sub-collections or contents in another zone
While investigating an inheritance issue, I tried out all the open inheritance reports and found this one can probably be closed, which was nice.
Closed Issues
closed on - 2021-08-24 16:52:53 codacy sweep 2021
closed on - 2021-08-19 12:30:03 upgrade from iRODS 4.1.11 to 4.2.8 and then 4.2.10 leaves you with two versions of irods-externals-nanodbc2
"This is by design - we want externals packages to be independently installable. We repackaged the same nanodbc2.13.0 more correctly (giving the 1-1 versioning). You can simply remove the unused-by-iRODS-4.2.10 irods-externals-nanodbc2.13.0-0-1.0-1.x86_64 package."
and also see
"A quick note on this:
Package managers worth their salt will mark nanodbc as automatically installed as a dependency. Once it is no longer required (i.e. once all packages depending on it are no longer installed), the package manager will recognize that it is automatically removable, and the administrator can remove it when doing housekeeping. In particular, apt will even tell you what packages can be automatically removed whenever you install anything via apt-get."
closed on - 2021-08-18 03:03:33 non-service-account icommands are noisy when hosts_config.json not readable
This issue was resolved in 4.2.9 by the following PR: #5403
"Resolution of hostnames via the hosts_config.json file was reimplemented and better protects server-side code from clients such as the icommands."
closed on - 2021-08-25 00:51:29 Add support for optional server configuration service endpoint
Interested to see this coming in 4.3.0;
"In support for the creation of a cloud-native iRODS implementation, the server configuration needs to be moved to an external service for access on agent initialization. An environment variable IRODS_SERVER_CONFIGURATION_ENDPOINT will direct the agent process to an endpoint for all server configuration. Should this environment variable not be present, the current server initialization process will ensue. A server configuration REST API has been created to provide the necessary configuration as an array of JSON objects, one for each configuration file."
closed on - 2021-08-18 03:03:51 Running icommands with new user throws errors but still works
DNS/host caching fixed this apparently (I wonder how?);
This issue was resolved in 4.2.9 by the following PR: #5403
closed on - 2021-08-23 22:32:55 Stopping iRODS on the host will kill iRODS servers running in a container
Fixed by this commit;
- No longer kills containerized iRODS servers.
- Only allows one iRODS server to run on a single host.
Now, I admit to being a Docker newbie here, but will this not prevent people running multiple docker instances of iRODS, say in a docker compose setup, or in Kubernetes? Tell me why I am mistaken?
closed on - 2021-08-11 12:08:41 new api plugin - release proxy data object
This looks like it isn’t needed any more.
closed on - 2021-08-11 12:09:34 new api plugin - create proxy data object
"I think we ended up going in a different direction for this, yeah"
Python iRODS Client Activity
Open Issues
SSL certificate verify fails even if disabled in environment file
'Yes, it appears that this library is not currently paying any attention to the “irods_ssl_verify_server” value in the environment file during ssl_startup()'
Closed Issues
NFSRods Activity
Open Issues
Issue copying a file file greater then 55Meg to nfsrods
The issue has been reproduced, and it seems like a lot of work in Jargon is required, which makes me wonder if other Jargon clients might also suffer from this bug?
Closed Issues
icommands Activity
Open Issues
Closed Issues
externals Activity
Open Issues
Closed Issues
closed on - 2021-08-10 08:54:19 Error compiling elasticlient
Self closed issue, nothing to see here?
YODA Activity
Open Issues
[QUESTION] Problem with installation
Tl,DR;
"It appears that the configuration example in our (allinone) documentation does not work as expected. Thanks for the heads up. We’ll review and update it - this issue has been registered in our internal bug tracker as YDA-4241."
Closed Issues
closed on - 2021-08-12 17:07:51 incorrect path to irods_environment.json in icat deployment role
I’m slightly embarrassed to admit this was me reporting an in issue, forgetting that I had local site ansible having run first!
closed on - 2021-08-02 10:22:12 [BUG] missing revision in a research area folder
Seems fixed in 1.7?
If you think someone else would appreciate this newsletter, they can sign up at https://theresource.metadata.school/
One Yak was shaved in the making of this newsletter.