Wasting less time to get more done
Thoughts
I finished another Successful Data Science Projects course last week, we had some very productive discussions about the whole early-stage through to deployment process and common problems along the way. Not having the right sponsors, or sponsors being unavailable, or not knowing the “bigger” project that other teams are working on were common issues.
We got talking on the ways we lose productivity and how we might get some of it back, I’m sharing a couple of links below on this. If you’d like a notification for my next Success class or any of my training courses just reply to this, I can send you a single email when I have the next date.
Good process
Via my Success course I came up with some reading for students. On HN the Advice gathered from people smarter than me felt relevant, notably “minor barriers aren’t minor”, “remember what used to work” (which is not necessarily what you do now) and “don’t blunder” - I feel the “don’t blunder” (or “don’t blow up” in my parlance) is useful - if you avoid complete failure, you’ll learn and progress, so avoiding complete failure should be a basic for any project.
Dan Luu’s post on Some reasons to work on productivity and p95 skill are also good - noting that whilst “working on the right problem” gives you a big edge, it can be hard to find the right problem. Instead focusing on productivity - learning to type faster, doing practice writing short influential documents and getting better at public speaking are actionable steps that will give you more bang for the time you spend on a subject. I’ve found myself rereading both of these articles over several months.
One thing for me is trying to avoid wasting time with Pandas - I keep notes on anything that causes me to get confused so I have a mini-cheat-sheet for next time. I also do retrospectives after client engagements to see “what didn’t go so well or so fast and what might I do differently next time?”.
What do you do that saves you time? I’d happily share a tip back here if you reply to this.
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
Open Source
Sebastian Raschka shares a tip on twitter for matplotlib subplot_mosaic which easily gives you named grids of subplots - this is totally new to me and I’ve started to use it. Worth a look if you make subplots in matplotlib.
Scikit-learn has a new blog, a recent entry talks on a sprint to improve joblib
which is my goto tool for easy parallelisation. Future goals include experimenting with the GIL-less Python branch to enable more thread-bases parallelism over process-based parallelism and providing a single API to unite many parallelisation backends for the simple set of tasks that joblib
helps with. If you care about easy parallelism you may want to take a look.
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
Random - If Bootcamps taught “real world data science”
This LI post by Vin Vashishta is worth a read - it highlights the extreme differences between “what you learn in bootcamp or colleage vs what happens in the real world”. If you feel any imposter syndrome, take a look and see which parallels apply to you.
Footnotes
See recent issues of this newsletter for a dive back in time.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I’m also on twitter, LinkedIn and GitHub.
Now some jobs…
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
Principal Consultant at Semantic Partners
Semantic Partners are experiencing significant demand for people with semantic technology skills, and to this end we are looking to hire around 50 people over the next 2 years. Ideally you have some practical Knowledge Graph experience but for those looking to get into Semantics, we are offering the chance to cross train into the technology skills listed below and offer full product training across and into several vendor graph products. About you - fast learner, critical thinking, requirements capture, logical reasoning, conceptual modelling, investigation, Engineering; Python/Java/C#/Javascript, HTML, CSS etc, Preferred skills - SQL, API design, HTTP System architecture
- Rate: Competitive
- Location: Remote
- Contact: Dan Collier dan.collier@semanticpartners.com (please mention this list when you get in touch)
- Side reading: link
Cloud Engineer (Python) - Anglo American
We are a new team at Anglo American (a large mining and minerals company), working on image data at scale. There are several problems with how things are currently, including: storage limited to local hard drives, compute limited to desktops, data silos and difficulties in finding and sharing image data.
The solution that you will help build will use cloud and web technologies to store, search, visualize and run compute on global scale image archives (Terabyte to Petabyte). Your focus will be on using cloud technology to scale up capabilities (e.g. storage, search and compute). You will be building and orchestrating cloud services including serverless APIs, large databases, storage accounts, Kubernetes clusters and more, all using an Infrastructure as Code approach. We work on the Microsoft Azure cloud platform, and are building on top of open-source tools and open-standards such as the Spatio-Temporal Asset Catalog, webmap tiling services such as Titiler, and the Dask parallel processing framework.
- Rate: Competitive day rate
- Location: Remote (UTC +/- 2)
- Contact: samuel.murphy@angloamerican.com (please mention this list when you get in touch)
- Side reading: link
Full Stack developer (Python & React) at Anglo American
We are a new team at Anglo American (a large mining and minerals company), working on image data at scale. There are several problems with how things are currently, including: storage limited to local hard drives, compute limited to desktops, data silos and difficulties in finding and sharing image data.
The solution that you will help build will use cloud and web technologies to store, search, visualize and run compute on global scale image archives (Terabyte to Petabyte). Your focus will be full stack web development, building back-end APIs and front-end interfaces for users to easily access these large image archives. We will be building on open tools and standards written in Python (such as the Spatio-Temporal Asset Catalog, Titiler and Dask), and you will be extending and modifying these, as well as writing new serverless APIs in Python. Front-end development will be within the context of Anglo American frameworks, primarily using React, and will involve map visualization tools such as Leaflet and/or OpenLayers.
- Rate: Competitive day rate
- Location: Remote (UTC +/- 2)
- Contact: samuel.murphy@angloamerican.com (please mention this list when you get in touch)
- Side reading: link
Senior Data Scientist, Experience Team at Spotify
We are looking for a Senior Data Scientist to join our Experience insights team to help us drive and support evidence-based design and product decisions throughout Spotify’s product development process. As part of our team, you will study user behaviour, strategic initiatives, product features and more, bringing data and insights into every decision we make.
What you will do: 1) Co-operate with cross-functional teams of data scientists, user researchers, product managers, designers and engineers who are passionate about our consumer experience. 2) Perform analysis on large sets of data to extract impactful insights on user behaviour that will help drive product and design decisions. 3) Communicate insights and recommendations to stakeholders across Spotify. 4) Be a key partner in our work to build out our product strategy so that we are relevant in the daily lives of consumers.
- Rate:
- Location: Covent Garden, London or Remote within the EMEA region
- Contact: annabeller@spotify.com (please mention this list when you get in touch)
- Side reading: link, link
Backend & Data Engineer
At Good With, you’ll work at the heart of a dynamic multidisciplinary agile team to develop a platform and infrastructure connecting a voice-enabled intelligent mobile app, financial OpenBanking data sources, state of the art intelligent analytics and real-time recommendation engine to deliver personalised financial guidance to young and vulnerable adults.
As a founding member, you’ll get share options in an innovative business, supported by Innovate UK, Oxford Innovation and SETsquared accelerator, with ambitions and roadmap to scale internationally.
Supported by Advisors: Cambridge / FinHealthTech, Paypal/Venmo & Robinhood Brand Exec, Fintech4Good CTO & cxpartners CEO.
Working with: EPIC e-health programme for financial wellbeing & ICO Sandbox for ‘user always owns data’ approaches.
• Rate: £50-65K + Share Options • Location: Flexible, remote working. Cornwall HQ
- Rate: £50-65K
- Location: Remote
- Contact: gabriela@goodwith.co (please mention this list when you get in touch)
- Side reading: link
Senior Data Scientist & Data Science Manager & Head of Data Engineering at Infogrid, Permanent, London
Infogrid is helping protect the planet and improve the lives of billions of people by making every building a smart building. Our goal is to be the global provider for connected devices in smart buildings. We already handle millions of events every day from tens of thousands of sensors and we’d like you to help us scale that by an order of magnitude over the coming months.
Sustainability is at our heart; buildings account for 39% of global carbon emissions and we’re creating real solutions to impact this! We are still early in our journey but have already achieved a lot; we raised a successful series A funding round, grew 5x in employee numbers within 12 months, and voted one of the top 10 most flexible places to work.
- Rate:
- Location: Remote UK
- Contact: myriam@infogrid.io (please mention this list when you get in touch)
- Side reading: link, link, link
Senior Data Scientist (Platform) - Signal AI, Full-time, London (UK)
You will be a core player in the growth of our platform. You will work within one of our platform teams to innovate, collaborate, and iterate in developing solutions to difficult problems. Our teams are autonomous and cross-functional, encompassing every role required to build and improve on our products in whatever way we see best. You will be hands-on working on end-to-end product development cycles from discovery to deployment. This encompasses helping your team discover problems and explore the feasibility and value of potential ML-driven solutions; building prototype solutions and conducting offline and online experiments for validation; collaborating with engineers and product managers on bringing further iterations for those solutions into the products through integration, deployment and scaling.
This particular role will initially be within a team whose responsibilities include effectiveness and efficiency of our labelling processes and tool, training, monitoring and deployment of systems and models for entity linking, text classification and sentiment analysis, among others, across multiple data types. This team also works closely with the operation teams to ensure systems and models are properly maintained.
- Rate:
- Location: London (Old Street) - Hybrid model (2 days a week in the office)
- Contact: jiyin.he@signal-ai.com (please mention this list when you get in touch)
- Side reading: link, link, link
Software Engineer, Data Engineer and TPM at Aflorithmic Labs, London (Hybrid)
We’re an audio as a service startup, building an API first solution to add audio to applications. We have customers and we’re fast growing.
As Audio-As-A-Service API-first Voice Tech company our aim is to democratise the way audio is produced. We use AI and “Deepfake for Good” to create beautiful Voice and Audio from simple Text-to-speech - making creating beautiful audio content (from simple text) as easy as writing a blog. Join a 23 people strong international engineering, voice, R&D and business team made out of 13 nationalities (backgrounds include: Ex-University of Edinburgh, PhDs, European Space Agency, SAP, Amazon).
Looking for a data engineer to work on our core data pipelines for our voice-as-a-service and support our team growing. Our stack includes Kubernetes, Python, NodeJS and we use a lot of kubeflow and the serverless stack.