5 minutes of Data Science

Subscribe
Archives
November 14, 2022

Week 45

5 Minutes of Data Science - week 45

Highlights from November 07 to November 13

Foreword

Since lately all the rage is stable diffusion, there was an interesting release last week regarding a search engine using such technology. It seems that it's still in an early phase, nonetheless interesting.

There was also a massive migration of AI-related folks from twitter to the fediverse, namely Mastodon. There are many servers for people in this area but it has been great to connect there - I've moved to the sigmoid.social server

And a simple toot (the fediverse equivalent of a tweet) asking about NLP tools for text annotation led me to this place where a lot of annotation tools are listed.

Come say hi on Mastodon and stay for the community. Twitter is still an option.

Cheers, Pedro.


Blogs

  • The pursuit of AI education - past, present, and future, by DeepMind
  • Characterizing Emergent Phenomena in Large Language Models, by Google AI
  • Multi-layered Mapping of Brain Tissue via Segmentation Guided Contrastive Learning, by Google AI
  • ReAct: Synergizing Reasoning and Acting in Language Models, by Google AI
  • Infinite Nature: Generating 3D Flythroughs from Still Photos, by Google AI
  • How a universal model is helping one generation of robots train the next, by Amazon Science
  • How scarce computing shaped Carlos Huertas's career, by Amazon Science
  • Method predicts bias in face recognition models using unlabeled data, by Amazon Science
  • Subspace Recovery from Heterogeneous Data with Non-isotropic Noise, by Apple Machine Learning
  • A Large-Scale Observational Study of the Causal Effects of a Behavioral Health Nudge, by Apple Machine Learning

Podcasts

  • Evolution of data platforms (Ep. 209), by Data Science At Home
  • Your Consent is Worth 75 Euros a Year, by Data Skeptic
  • Hybrid computing with quantum processors, by Practical AI
  • The Evolution of the NLP Landscape with Oren Etzioni - #598, by The TWIML AI
  • Data Journalism in the Age of COVID-19, by DataFramed
  • Product Owners in Data Science - Anna Hannemann, by Data Talks

Youtube

  • Consciousness and the Chinese Room [Special Edition] (CHOLLET, BISHOP, CHALMERS, BACH), by Machine Learning Street Talk
  • Using machine learning to optimise agriculture in Brazil | AI by you - Vitor’s story, by DeepMind
  • Using AI to manage resources in Africa | AI by you - Arnol’s story, by DeepMind
  • Can AI help to unlock the mysteries of the mind? | AI by you - Weronika’s story, by DeepMind
  • Transforming medicine with AI | AI by you - Sneha’s story, by DeepMind
  • How can AI help us fight climate change? | AI by you - Julia’s story, by DeepMind
  • AI for everyone needs AI by you, by DeepMind

Reddit

  • hot take: forget data science, we need more analysts, at r/Data Science (💬187)
  • Seems a bit crazy, 400 applications within 3 days! Does this put anyone else off applying?, at r/Data Science (💬186)
  • I want to post for those just coming to this sub... People will shit on you, tell you to do more/get experience, give snarky comments. KEEP GOING, at r/Data Science (💬105)
  • ML/AI role as a disabled person, at r/Machine Learning (💬54)
  • A relabelling of the COCO 2017 dataset, at r/Machine Learning (💬21)
  • Current Job Market in ML, at r/Machine Learning (💬100)
  • Can you help with this exercise of multivariate statistics? I really can’t get anything out of it., at r/Ask Statistics (💬11)
  • Help decide on a model for my research proposal, at r/Ask Statistics (💬9)
  • What do statisticians/data scientists do with SQL and Python and what advantages do these have over R?, at r/Ask Statistics (💬13)
  • Need some suggestion for my thesis topic titled as Crack damage detection, at r/Latest in ML (💬0)

Github jupyter notebook trends

  • nn-zero-to-hero: Neural Networks: Zero to Hero
  • Mubert-Text-to-Music: A simple notebook demonstrating prompt-based music generation via Mubert API
  • whisper: Robust Speech Recognition via Large-Scale Weak Supervision
  • micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
  • mlbookcamp-code: The code from the Machine Learning Bookcamp book and a free course based on the book
  • Made-With-ML: Learn how to responsibly develop, deploy and maintain production machine learning applications.
  • amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
  • Complete-Python-3-Bootcamp: Course Files for Complete Python 3 Bootcamp Course on Udemy
  • ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
  • annotated_deep_learning_paper_implementations: 🧑‍🏫59 Implementations/tutorials of deep learning papers with side-by-side notes📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...),🎮reinforcement learning (ppo, dqn), capsnet, distillation, ...🧠
  • google-research: Google Research
  • nerf: Code release for NeRF (Neural Radiance Fields)
  • tsfresh: Automatic extraction of relevant features from time series:
  • be-theboss-in-python: This repo helps you to be the boss in Python.
  • yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
  • pytorch-Deep-Learning: Deep Learning (with PyTorch)
  • mlops-course: A project-based course on the foundations of MLOps to responsibly develop, deploy and maintain ML.
  • tsai: Time series Timeseries Deep Learning Machine Learning Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
  • handson-ml2: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
  • py: Repository to store sample python programs for python learning
  • azureml-examples: Official community-driven Azure Machine Learning examples, tested with GitHub Actions.
  • examples: TensorFlow examples
  • pytorch-seq2seq: Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.

Github python trends

  • tinygrad: You like pytorch? You like micrograd? You love tinygrad!❤️
  • langchain: ⚡Building applications with LLMs through composability⚡
  • lama-cleaner: Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
  • recommenders: Best Practices on Recommendation Systems
  • awesome-python: A curated list of awesome Python frameworks, libraries, software and resources
  • ColossalAI: Colossal-AI: A Unified Deep Learning System for Big Model Era
  • devops-exercises: Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
  • diagrams: 🎨Diagram as Code for prototyping cloud system architectures
  • CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
  • latexify_py: Generates LaTeX math description from Python functions.
  • keras: Deep Learning for humans
  • alphafold: Open source code for AlphaFold.
  • encodec: State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
  • LibreTranslate: Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

See you next week!

Don't miss what's next. Subscribe to 5 minutes of Data Science:
GitHub X LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.