Causality in observational data, dichotomization issues and more
1 - CausalImpact
Drawing causal conclusions based on observational data is a hard task. With that in mind, a google team developed CausalImpact, a causal tool for time series data.
https://google.github.io/CausalImpact/CausalImpact.html
An example would be given a website, we want to know if changing some aesthetic parameters will increase or decrease some metric. The idea is that given some predictors, one can try to predict this metric and then do causal inference based on a bayesian framework.
Thank you to Elham for mentioning this tool to me!
2 - An unexpected proof
Randomness is everywhere. But that does not mean it does not possess any structure. This can be studied under a threshold perspective, i.e., what is the threshold such that some structure or property shows up? For example, the threshold for a random graph and the property of containing hamitonian cycles is log(n)/n, where n is the number of vertices. Such kind of thresholds are difficult to calculate, so mathematicians usually calculate an expectation threshold. Park and Pham, two mathematicians, proved that these two thresholds are in fact very similar to each other, up to a logarithmic factor.
https://www.quantamagazine.org/elegant-six-page-proof-reveals-the-emergence-of-random-structure-20220425/
The assay above gives a very intuitive explanation for the proof.
3 - Dichotomization
Given a continuous variable and an outcome, how should one study their association? Dichotomizing? In the assay below I show how the dichomization process might be problematic and how it can be gamed in the pursuit of finding statistically significant results.
https://chronchi.github.io/blog/2022/04/04/dichotomization-part1.html
4 - Fast R installation with binaries in Ubuntu
Ever wanted to download and install R very quickly in a server or cluster? Look no more, by providing CRAN packages as binaries, the installation process is sped up and shared library problems are gone!
https://eddelbuettel.github.io/r2u/
5 - THOT: a tool for data management and analysis
Data is everywhere, but how can we analyse it in a reproducible way? Moreover, data is constantly generated for a project, how can we include new data in a current analysis pipeline? THOT is an open source software that helps you organize your analysis and expand it using python scripts. Right now it is in a beta stage. In the near future they also plan to include support for R scripts. It might be worth it to keep an eye on it!
https://thot-data.com/