The Resource - October 2020
Hello Reader, here is this month’s iRODS news and developments!
If you’re facing an issue with iRODS you’re not sure how to solve, please do drop me a line; if I’ve come across a solution or seen something relevant elsewhere I’ll do my best to let you know. Or just drop me a mail to say ‘Hi’. Always nice to hear from people, particularly in these pandemic times!
Important!
The newsletter is moving! The Resource will be moving away from the current domain to https://theresource.metadata.school/, in order to avoid any trademark, copyright or association issues. ** Next time you get this email it will be from metadata.school! **
I’d love your thoughts and feedback on how this newsletter could be better for you: https://docs.google.com/forms/d/e/1FAIpQLSfODAa7U4ST9U9Tuc6S1PlQhPtoiFybKzXVgtoVnkx7ISe41A/viewform?usp=sf_link
News
Are you registered for HacktoberFest? https://hacktoberfest.digitalocean.com/ I’m not sure I’m going to get the four required PR’s done this month, but hey, its a fun idea!
Since the last newsletter, the Development Update came out; https://irods.org/2020/09/irods-development-update-september-2020/
Baton has had a 2.1.0 Release ( https://github.com/wtsi-npg/baton/releases/tag/2.1.0 ) This has a workaround for Issue 5072 (so if you are using multiple uploads/operations against 4.2, you may want to update), some bugixes and adds 4.2.8 to the test matrix.
HTSLib developer John Marshall found the cause of an iRODS integration bug: https://twitter.com/jomarnz/status/1311249536740929536?s=20
New RFC! Static PEPs and Delay rules - https://github.com/irods/irods_rfcs/blob/master/0006_static_peps_and_delay_rules.md
Main Repository Activity
I’m trying out reporting these in reverse chronological order. Like it? hate it? Let me know… I’m also not going to tag everything (although you might not think that from the list below!). I tend to be interested, and hence report on, features and bugfixes rather than API specifics or refactoring.
New issues this month
pep_api_auth_response_pre can be triggered without _post or _except - https://github.com/irods/irods/issues/5201 My colleague has a deep dive into using the rules to determine if we can tell an end user if they are using the wrong auth mechanism. OOI, did you know that once you have SSL configured the end user can choose to use PAM or native auth in their environment file? Can be useful, except when those same end users dont read the mails telling them to switch…
Un-closed replicas should be marked stale on agent teardown - https://github.com/irods/irods/issues/5195 “If rcDataObjClose is not called before disconnecting, open replicas are left in an unknown state. In 4-2-stable/master, this results in an intermediate replica. In 4.2.8 and back to the beginning of time, the replica is left in its previous state (or good, if new) but this is possibly not true; and even if it is true, the catalog is not necessarily updated properly to reflect the data.”
imeta does not print an error when failing to parse a path string - https://github.com/irods/irods/issues/5186
imeta ls[w] skips path validation for collections if detailed output is requested - https://github.com/irods/irods/issues/5185
imeta addw prints “AVU added to x data-objects” to stdout for valid paths regardless of success/error - https://github.com/irods/irods/issues/5184
release activities for 4.2.9 - https://github.com/irods/irods/issues/5183 Looks like 4.2.9 is on the horizon! Will it be out before the next newsletter?
rsDataObjRepl not returning errors from rsDataObjClose - https://github.com/irods/irods/issues/5179 I think, on reading the dexcription, this is a bug in the code between 4.2.8 and 4.2.9 so hopefully by the time we get 4.2.9 this will be ironed out.
iquest matches the select keyword when it shouldn’t - https://github.com/irods/irods/issues/5178 If you collection or filename contains ‘select’ you might not get what you expect!
iphymv (et al?) returns SYS_USER_NO_PERMISSION when it should be ENOENT (or equivalent) - https://github.com/irods/irods/issues/5177 (Incorrect error when attempting to iphymv an object that is absent from the filesystem)
Request for icommands to be built for Ubuntu 20.04 - https://github.com/irods/irods/issues/5174 More support for Ubuntu 20.04! I think the consortium would rather put their effort into the single irods binary instead, which replaces all the icommands and should be easier to package, but curcially, is not ready yet! We’ll see.
Default number of threads advanced setting should be 3 - https://github.com/irods/irods/issues/5171 Interesting; if you read Terrell’s paper showing the performance improvements in 4.1.9, you might remember they showed that 3 threads was enough to full a 10G NIC. Now, you might h=be running your iRES/Consumers with 25, 25 Bonded, or (steady on), 100G bonded, but that’s worth taking into account.
zonereport does not include resc_id information - https://github.com/irods/irods/issues/5170 / https://github.com/irods/irods/issues/5159 Which as the issue says, makes it hard to use it to assemble the tree later. Originally I think the idea was to have a way to take an izonereport and feed it into another tool to create an identical zone for tetsing. That would be super, but I dont think anyone has done it yet. Perhaps you might?
data object size changed to zero (0) after re-replicating modified data object with irepl - https://github.com/irods/irods/issues/5160 Potentially a worry if you are replicating large files, whether by rule logic or ‘irepl’. We had this issues in 4.1.12, but we’ve not seen it in 4.2, yet….
ilsresc -l prints CAT_NO_ROWS_FOUND with exactly 350 resources, with return code 0 - https://github.com/irods/irods/issues/5155 So make sure you have one more, or one less! Not a problem many have, perhaps, but frustrating when you have scripted around the output!
reLog filling with stat error for rei file and stack traces - https://github.com/irods/irods/issues/5153 The interesting thing about this one is the discussion around how to ensure that only one server in a Zone runs the Rule Engine process. Current method is to kill the reServer process if you have more than one Provider, otherwise you get a lot of these logs, which is somewhat manual.
Add API plugin that implements the unix “touch” command - https://github.com/irods/irods/issues/5152 - closed on 2020-09-22 18:26:41 Prep work for the ‘itouch’ command in 4.2.9 - see https://github.com/irods/irods/issues/4694 (now closed)
Grandpa dies from file descriptor exhaustion - https://github.com/irods/irods/issues/5144 Grandpa, here, is the Consumer target across a federation link, where there were an excessive no of igets (100’s). If you haven’t protected your zone with Rate limiting as in Tony Edgin’s talk at the UGM in 2019, something to consider.
Do we need an imeta addw for collections? - https://github.com/irods/irods/issues/5131
imeta addw returns non-zero on seeming success - https://github.com/irods/irods/issues/5101
imeta now exits when it encounters an empty line on STDIN, rather than ^D - https://github.com/irods/irods/issues/5081
iput under extreme ulimit pressure gives assorted failures and hangs - https://github.com/irods/irods/issues/5074 People in HPC or container environments, take note!
imeta qu should return non-zero on Unrecognised input - https://github.com/irods/irods/issues/5021
iRODS doesn’t handle or report failure of one irodsServer process - https://github.com/irods/irods/issues/4947 Systemd, the gift that keeps giving.
Support for Ubuntu 20.04 - https://github.com/irods/irods/issues/4883
iquest fails when select is used in argument string - https://github.com/irods/irods/issues/4697
Delay server should store rei context in catalog - https://github.com/irods/irods/issues/4428 Part of the work to move all the rules state files to within the catalog in 4.2.9.
new api plugin - finalize data transfer - https://github.com/irods/irods/issues/4331
Support the ‘OR’ operator in the GenQuery interface - https://github.com/irods/irods/issues/4069
Add systemd support - https://github.com/irods/irods/pull/3999 Leaving this pull request in here because its an ongoing discussion; if you have systemd experience or opinions, please do check it out.
ilsresc and imeta errors displayed directly from irods::exception::what() using printErrorStack() - https://github.com/irods/irods/issues/3994
replace GenQuery ad-hoc parsing code with flex/bison parser - https://github.com/irods/irods/issues/3902
‘in’ in IN() is INVALID - https://github.com/irods/irods/issues/3886
genQuery uses ordered strstr() to find where condition keywords - https://github.com/irods/irods/issues/3064
move packedReis to db, add delay server boolean to server_config.json - https://github.com/irods/irods/issues/3049
iput should be more helpful re: OVERWRITE_WITHOUT_FORCE_FLAG - https://github.com/irods/irods/issues/2383
Closed issues
Bugs and functionality improvements that were addressed in some way this month.
Need a way to drain processes on a resource server non-disruptively - https://github.com/irods/irods/issues/5188 - closed on 2020-10-13 13:34:41 A dive into the assorted ways to non-disruptively take an Consumer out of action. Didn’t go where I expected!
Race conditions in logging code lead to terminating with uncaught exception of type boost::filesystem::filesystem_error: boost::filesystem::directory_iterator::operator++: Not a directory: "/proc/self/fd" - https://github.com/irods/irods/issues/4943 - closed on 2020-09-12 03:12:08
Cannot use replica library on the server-side - https://github.com/irods/irods/issues/5156 - closed on 2020-09-22 18:26:51
Consider adding functions that allow leaf resource names to be converted to replica numbers and vice versa - https://github.com/irods/irods/issues/5143 - closed on 2020-09-22 19:00:18 Consider adding overloads to the replica library that accept leaf resource names instead of replica numbers - https://github.com/irods/irods/issues/5142 - closed on 2020-09-22 19:00:03 Both of these have got pull requests so should be in 4.2.9, and I think are part of addressing 5170.
Remove support for updating the mtime of data objects from irods::filesystem::last_write_time() - https://github.com/irods/irods/issues/5119 - closed on 2020-09-15 13:02:12
Remove rcConnect.h dependency from filesystem implementation - https://github.com/irods/irods/issues/5118 - closed on 2020-09-15 13:01:11
irods::filesystem::data_object_checksum() needs to return the size of the latest good replica - https://github.com/irods/irods/issues/5117 - closed on 2020-09-15 13:01:54
irods::filesystem::data_object_size() needs to return the size of the latest good replica - https://github.com/irods/irods/issues/5116 - closed on 2020-09-15 13:02:02
imeta does not correctly parse -lC - https://github.com/irods/irods/issues/5111 - closed on 2020-10-08 18:58:04 For those not intimately familiar with -lC that’s listing collection metadata. You can put metadata on collections? Indeed you can, but until 4.2.9, you couldn’t easily see the date stamp of when you did so.
irods::filesystem::last_write_time() needs to return the time of the most recent good replica - https://github.com/irods/irods/issues/5105 - closed on 2020-09-15 13:01:35
Create proxy objects for dataObjInfo_t - https://github.com/irods/irods/issues/5104 - closed on 2020-09-14 13:19:10
Add library that allows easy manipulation of replica information - https://github.com/irods/irods/issues/5103 - closed on 2020-09-12 00:10:10
iadmin lr returns stale information: resc_objcount - https://github.com/irods/irods/issues/5099 - closed on 2020-09-12 03:05:22
Document the user group administration library - https://github.com/irods/irods/issues/5086 - closed on 2020-09-15 13:01:21
CAT_NO_ROWS_FOUND in log for successful iput of new file - https://github.com/irods/irods/issues/5064 - closed on 2020-09-24 12:53:11
Merge the update_collection_mtime REP into the server - https://github.com/irods/irods/issues/5063 - closed on 2020-09-15 13:02:59
filesystem::last_write_time must require an explicit replica number or update all replicas - https://github.com/irods/irods/issues/5061 - closed on 2020-09-15 13:15:06
Add switch for using rclOpenCollection/rclReadCollection in collection iterator - https://github.com/irods/irods/issues/5049 - closed on 2020-09-15 13:10:43
Add a RESOURCE_SKIP_VAULT_PATH_CHECK_ON_UNLINK resource property - https://github.com/irods/irods/issues/5030 - closed on 2020-09-23 17:45:31
Create a new asynchronous API plugin interface - https://github.com/irods/irods/issues/5007 - closed on 2020-09-16 13:57:57
add new msiTouch microservice - https://github.com/irods/irods/issues/4669 - closed on 2020-09-22 18:28:40
new multipart transfer mechanism - https://github.com/irods/irods/issues/4336 - closed on 2020-10-08 17:05:22
add client support for manually update the mtime of collections - https://github.com/irods/irods/issues/4190 - closed on 2020-09-22 18:29:41
deb package should be lintian clean - https://github.com/irods/irods/issues/837 - closed on 2020-09-15 02:24:44
get rpm build into state of rpmlint error-free - https://github.com/irods/irods/issues/826 - closed on 2020-09-15 02:25:10
Twitter Activity
Nothing as exiting as the multi-tweet job description this month! (Also, Welcome Markus!)
BIO-IT World had some familiar faces from last years UGM!
iRODS will present from BIO-IT World this year! We’ll be presenting a panel featuring researchers from #iRODS, @NIEHS + @BMSNews . #BioITWorld #DataManagement #DataStorage http://bit.ly/iRODSatBioIT20 https://twitter.com/irods/status/1308040828481343488?s=20
Oleg Moiseyenko from @BMSNews will dive into the use of @awscloud + #iRODS in the #datamanagement of #NGS data for #cancerresearch. Catch this preso at 9:40 a.m. EDT on 10/7 at @BioITWorld. #BioIT20 #AWS https://t.co/InbAOmmDpg pic.twitter.com/wqUyyid1sX
— iRODS (@irods) October 2, 2020
With an open, policy-based platform, #metadata can be elevated beyond assisting in search + #discoverability. @TerrellRussell , Chief Technologist at #iRODS will explore using metadata to drive #datamanagement at 9 a.m. EDT on 10/7 at @BioITWorld . #BioIT20 http://bit.ly/iRODSatBioIT20 https://twitter.com/irods/status/1309139289779904512?s=20
At 9:20 a.m. on 10/7, @NIEHS ‘ Mike Conway will discuss the process of building a @NIH Data Commons for #researchdata using #iRODS at @BioITWorld . #datamanagement #datastorage #BioIT20 http://bit.ly/iRODSatBioIT20 https://twitter.com/irods/status/1310583821109276672?s=20
Love to know the backstory to this! It lead to another epic thread quoting from the paper (one presumes). Are epic threads going to be the new iRODS comms platform? Seems possible!
New Report from @ArmyERDC ERDC/ITL SR-20-12 Integrated Rule-Oriented Data System (iRODS) and High Performance Computing (HPC) Architecture Design https://twitter.com/irods/status/1310930606973317120?s=20
If you think someone else would appreciate this newsletter, they can sign up at https://theresource.irods.academy/
7 Yaks were shaved in the making of this newsletter.