January 2025 Outlook: Gen AI, Working with Big Data
Generative AI
I recently started using GitHub Copilot, an AI tool to help with coding (not to be confused with Microsoft Copilot, a completely different product that is also owned by Microsoft!). The first thing I was prompted to do was watch a few short videos on how to use the tool and how different parts of the app were designed to support coding. These were helpful and got me thinking more about ‘prompt engineering’ - that is, how to ask questions to AI chatbots that actually yield useful answers. This is turning out to be much harder than originally thought. While I probably could have figured this out through trial and error, I did benefit from a wee bit of instruction. This is turning out to be the situation with most generative AI tools. Sure, these applications can be useful, but there is a bit of a learning curve to get there.
Unfortunately, much of the chatter around generative AI is not this forthright. There is a tremendous amount of hype around generative AI as proponents insist that this technology is changing the world as we speak and if you don’t get on board now, you will be left behind. It’s worth noting that there are very strong financial incentives to hype generative AI since it is an expensive technology that has been bleeding cash since its inception. It is an impressive technology that will probably turn out to be helpful, but when or to what degree is unknown at this time. I wrote a blog post detailing some of the concerns about the actual utility of generative AI as well as some of its known drawbacks. Over the next year, I plan to review and report on generative AI technologies. Which ones help scientists and which ones are a waste of time and money?
Big Big Data
There’s a number of projects being conducted at the University of Idaho that will generate (or have generated) large data sets for making sense of our agricultural systems. Before anyone goes running off to spend hours learning how to use the UI high performance computing cluster, consider what can be done on your personal home computer. Computing power has really grown the past few years, outstripping data growth. I wrote up (another) short blog post detailing what I do when I am working with a larger-than-average data set in R. There are some easy solutions to make import, wrangling and analysis of large data sets not just possible, but routinely smooth in R.
Other Stuff
Here is a brief, bare bones introduction to R. We love the simplicity and brevity of this resource!
Out of the University of Idaho’s College of Natural Resources is this excellent guide to reproducible data science. What is that, you ask? It is being able to take your scripts and data sets and fully repeat an analysis, getting identical results as before. This is surprisingly hard, and these resources will help you achieve that. Remember, this effort is most likely to help future you, when you have to revisit old scripts and make sense of your own past decisions.
A new R package to support incorporation of spatial covariates into linear modelling. This office is a big advocate of accounting for spatial variation (e.g. plot position). You can improve the accuracy and precision of your estimates when accounting for spatial covariates!
A new R package with all sorts of fun, preset ggplot themes! My favorite is the “Barbie” theme, but there is also Game of Thrones, Avatar, the Simpson, Harry Potter themes and much more.
Thanks for reading!
Julia & Harpreet