UIdaho Agricultural Statistics Newsletter logo

UIdaho Agricultural Statistics Newsletter

Subscribe
Archives
September 5, 2025

Welcome Back, Fall 2025

Fun and Useful Reads

Rather than “must reads” (I suspect we all have enough of those in our lives), here are some reads that are informative, relevant across many disciplines, and in most cases, not overly long.

How to choose your programming language. A short article from Nature to help the new researchers decide how to allocate their most limited resource, time, towards learning new programming tools.

Six questions to ask before jumping into a spreadsheet. There is a famous expression about data: “Clean data sets are all alike; every messy data set is messy in its own way.” These guidelines can help you structure your spreadsheet to avoid all sorts of future problems with your data that will stymie analysis.

Some tips for learning statistics from Kareem Carr. One of his main points is that statistics takes time to learn; time that you must designate and set aside if you want to make progress in this arena.

Positron workshop [content]. The Posit team has put together a set of lessons online you can browse to learn more about using their newest IDE. If you love RStudio and don’t want to change, no problem! That’s a great piece of software that will continue to be supported. If you want to use other programming languages or do R development, Positron is worth considering.

If you do decide to explore Positron, Posit has developed a few within-app AI agents to assist in coding: Positron Assistant and Databot. Both use existing large language models (e.g. Anthropic’s Claude) and don’t require additional sign-up beyond what the LLM already requires.

An idiot’s guide to effective population size, published by Molecular Ecology (that is actually the article title). This is admittedly a rather niche topic not of interest to every person receiving this newsletter, but if you’ve ever tried to do this, it always ends up being surprisingly hard for what is supposed to be a standard population genetics concept (which the author calls “elegant but slippery”).

A new R package, ‘nematode’ was released. In the author’s words:

Nematode communities serve as crucial bioindicators in ecological studies, reflecting soil health, ecosystem functioning, and trophic interactions. To standardize these assessments, we developed a computational toolkit for quantifying nematode-based ecological indicators, including metabolic footprints, energy flow metrics, and community structure analysis.

For goodness’ sake, sign up for the 2025 Posit Conference. Virtual registration for this 2-day conference is free (FREE!) for academic attendees. The schedule looks interesting: talks on Positron (their new IDE), R/Shiny, integrating large language models (LLMs) into your R workflow, and several meta sessions on how to make R reproducible and how to make collaborative R workflows across a large organization. It is held September 17-18, so register and watch it live, or use your (FREE) registration to have immediate access to the session recordings.

Now That Your Summer of Data Gathering is Over

I realize this header is a bit optimistic; people may be done (or close to done) with their summer trials, but now begins the long march of sample processing – in the seedhouse, next to a microscope, in a wet lab, or in a dry lab next to a collection of scales and ovens. Somewhere, graduate students, technicians, and temporary help are laboring over samples they want to extract data from and analyze. However, it will all eventually get done, and then what is next? Data preparation and analysis.

This office has talked at length about data organization to nearly everyone that walks through our door. If you are managing your data in spreadsheets, one of the best guidelines you can follow are those from Broman and Woo (2017). I would also add that limiting the number of transformations (mathematical and non-mathematical) occurring within a spreadsheet are recommended given how easy it is for mistakes to be introduced and not detected for months or years, if ever. This is all I want to say on data organization for now, but the new researcher should know that this step will constitute the majority of your time. You will spend a considerable amount of time on data preparation and comparatively very little time on analysis of said data.

Now, on to the fun stuff, data analysis. With many new researchers, I commonly see this form of statistical malpractice: using p-values to guide which results to focus on or perhaps report altogether, usually using the magic threshold of 0.05. This is not a recommended approach for many reasons, including how it contributes to publication bias. However, the most important reason I don’t recommend it is because it distracts researchers from looking at their most important questions. Studies are conducted for specific reasons, asking specific questions.

  • How much does this feed ration increase energy-corrected milk?
  • Does this cover crop impact soil moisture to a noticeable and meaningful extent?
  • Which of these treatments is the most effective at suppressing potato germination?
  • What is the relationship between climate variables and preharvest sprouting?

Use your data analysis to answer the questions posed by your research. Report those results, in a table, in a figure, in the results section or somewhere else. Do not dismiss other results as unworthy of being reported because their p-values are too high to award the label “statistically significant.” Even GWAS studies, where p-values are the results, usually publishes Manhattan plots showing all test results. You should report all the results because their differences may be close to the p-value threshold, or have a meaningful difference obscured by high variance, or just be extremely variable. Let readers see this and decide. Confidence intervals and p-values from hypothesis tests are tools to help support your conclusions, but they are not results by themselves. Report both estimates and p-values, and allow your audience to draw conclusions based on that. This is particularly important in field studies; these are extremely noisy and beset with environmental variation. That variation is worthy of reporting. We have a short blog post on p-values that expands on these ideas. The American Statistical Association has also developed some comprehensive guidelines for appropriate usage of p-values.

In general, employing common sense with professional modesty and avoiding strong declarative statements when reporting and discussing experimental results can go a long way towards averting p-value abuse. — Esteemed Statistician Bill Price


Thanks for reading!

Julia & Harpreet

Don't miss what's next. Subscribe to UIdaho Agricultural Statistics Newsletter:
Powered by Buttondown, the easiest way to start and grow your newsletter.