2024 Posit/RStudio Conference Report
This is a very programming-focused newsletter! We promise they won’t always be like this, but we are fresh off the Posit/Rstudio Conference (called ‘Posit Conf’) thinking about these things.
Posit PBC (the company formerly known as “RStudio”) is beta-testing a new IDE (integrated development environment - i.e. software) called “Positron”. It’s a fork of VSCode and hence looks a lot like VSCode. Please know: If you are happy with RStudio (the IDE), there is no need to change. RStudio will continue to be a fully supported IDE for some time. Why try Positron? It’s fun to try out new things, but for realz: it has better native support for other programming languages besides R. While running python is technically possible in RStudio, it’s a process ordeal. Why use Positron instead of VSCode? As far as I can tell, there are better integration of tools we love about Rstudio: the help pane to look up function syntax, a pane of all objects loaded into the environment and their properties, and a plot viewer. It seems like Positron is for data scientists, while VS Code is for developers. Here is a comprehensive review of Positron and a 6-minute review video from a beta tester. You can find download links for Positron here.
There was quite a bit of discussion about Quarto (the Rmarkdown replacement): new use cases and ease of usage for producing websites, reports, books, and dashboards. We have been using it to deploy our “Introduction to R” course homepage and other resources because it’s so easy and flexible. You can also automate creating static reports with it – a more efficient option than cut-and-pasting R outputs for your boss or colleagues. Here’s the Quarto homepage, which has so many resources for getting started with this. The Posit YouTube channel also has an entire playlist dedicated to generating nice outputs. How to leverage Quarto was also discussed at the 2024 Joint Statistical Meetings. One of the coolest extensions written for Quarto is a ‘storytelling’ tool - check out this neat example of Napolean’s ill fated march through Russia (accompanying a seminal and still utterly compelling infographic).
Other Interesting Talks at the Posit Conference:
NASA Data
A new python package ‘earthaccess’ has been released to ease the burden of downloading NASA data sets. If you have ever tried to do this, you may have noticed how surprisingly cumbersome and complicated this process can be, so any help is welcome! NASA also has a number of useful tutorials on accessing data on their ‘Openscapes’ website.
Data Visualization
There was a talk delivered on how to make unique and interesting data visualizations. The speaker went through why this matters, how to approach it, and how to find inspiration and ideas. TLDR; it’s okay to repurpose elements of visuals you like, try to to be innovative and identify your personal sources of inspiration. This might be a collection of the most compelling data visualizations I have ever seen. Here are the slides and other resources from that talk.
“Big” data
A keynote by Hannes Mühleisen of DuckDb Labs argued that ‘big data’ is slowly ceasing to be a thing, or at least becoming relatively uncommon because computing hardware development is outstripping the pace of data set growth. While it is sometimes difficult to share large files over email, importing those analyzing those same files into any modern analytical software is often quite trivial. For R, there is a general rule of thumb: if the data set is one-third of your machine’s actual physical memory or less, all is fine. Anything larger may cause your computer to grind to a halt or process commands at an imperceptible pace. But, we can also get around these limitations by using Parquet files (pronounced “par-kay”) in tandem with the arrow package. If you prefer python, the arrow library interfaces with Parquet files. Parquet is an alternative file format that compresses your data down to almost nothing. The ‘arrow’ libraries allow researchers to read their file content without loading the entire dataset into memory. I have been able to easily load a 7 million row data set into a R session running on a Windows desktop computer (16 Gb RAM, 3.60 GHz Intel Xeon processor). Honestly, Slack is using more memory than a typical R session. From my point of view, Parquet + arrow is basically magic.
Most (all?) workshops and talk materials are already posted online and collated by Posit in their conference review. All recorded Posit Conf talks will be posted on the Posit YouTube channel in 3 months.
Thanks for reading!
Julia & Harpreet