Automatically finding interesting data relationships, more thoughts on the Unschedule
Thoughts
Below I talk on leading vs lagging indicators to track progress, the lux
auto-interesting-relationship discovery tool and share Jesper’s thoughts on the Unshcedule that I recently wrote about.
With one of my insurance clients we’ve been talking about using metrics on R&D projects to signal progress. Whilst we’re doing well on our projects the path is inevitably convoluted. In this keynote talk Bandon Sanderson (prolific author) talks on avoiding “lagging indicators” and using “leading indicators”. In our case “number of cases that need follow-up investigation” would be a lagging indicator (and might not be resolved for many months) but “analyses made this week” is easy to count and communicates progress.
What indicators do you use to measure and communicate progress?
I got interviewed by Douglas Squirrel yesterday as part of his private Squirrel Squadron series - the topic was “how not to use your data science team” and given how much strategic work I do with teams, you’ll know I had strong opinions to share. Squirrel’s built up a nice set of CxO folk who attend his events, it is free for folk at that management level and he’s got more events on AI, security and compliance coming up, once in members get access to the past videos. If you or a boss would benefit, do think on joining (it is free, just private and small scale for good conversation).
We talked about the danger of having an “ivory tower” data science unit and instead using a consulting (or hub-spoke) model which keeps data scientists centralised but talking to many clients vs having cross functional teams which embed the DS folk into the business units. Both have their merits. We also spoke on the difficulty of the cold-start problem when you build a new DS unit in a less-mature organisation (solution - focus on the wins that seem feasible and give the team some latitude to go dig, coupled with a PM or Product Owner who understands the business).
Good Process
A month back I wrote on the Unschedule technique for planning “that which matters most” (e.g. exercise, self-time, family) and “filling in the blanks” with achieved work afterwards. I’m still using this technique and it seems I’ve converted Jesper who has written about his process at length (worth a read, as is his newsletter). I asked him for some more notes and he’s replied:
“Why can’t I be normal and productive” was something I have thought many times in my life. Shoutout to the neurodivergent crew. Took me until 30 years old and a global pandemic to figure out that my brain might in fact work differently. With that out of the way, I have been tracking my time for over 10 years, read the productivity books and some people might even consider me productive. Yet the data is pretty clear, I don’t work 20 hours a day. In fact working 8 hours as a knowledge worker was usually too much for me. Reading an account by people as prolific as Ian and David McIver that reflect my experience, has been such a delight.
Then reading about the Unschedule was eye-opening. I have been doing something very similar for years now! Schedule sleep and things that matter first, make time for hobbies and loved ones. Then fit in the work, but I’ve still always tried to fit in the 8h as a block even when it wasn’t required by an employer. So I’ve done the natural thing now and completely overhauled my Google Calendar. Put in mandatory things but left project work open. If I know my brain, the projects and obsessions will happen anyways. And so far they did, which is obviously skewed by the novelty effect. But so far I’ve finally practiced some drums again, hit the gym 3 times, and got more high-quality work done than the slump that was January.
Predominantly, I was asked one question when I told people about the Unschedule: Which Apps do you use? Google Calendar with recurring fixed blocks for the things that matter with weekly time audits to adjust what works and what doesn’t. I dug up Toggl to track “deep work” and Zapier to automatically put those times on my calendar for prosperity. Finally, and this one’s a bit extreme for most, I use Beeminder that keeps me accountable by using commitment contracts. The possibility of losing money over a task, e.g. not writing a newsletter issue, directly shortcuts my dopamine cycle to prioritize things I know are important during planning phases and cut through procrastination.
What do you do that saves you time? I’d happily share a tip back here if you reply to this.
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
Open source
I’ve just started to play with lux which is a pandas-profiling equivalent using the Altair plotting framework. Pandas-Profiling is a superb EDA tool that uses web based graphing to explore univariate descriptions of DataFrame columns in a Notebook. Lux does the same but tries to expose interesting and interactive relationships automatically.
I’ve just been teaching using the French Auto 3rd Party insurance dataset for a scikit-learn course and having just used lux
I’ve seen some relationships in the data that I hadn’t noticed in my manual exploration. Here younger vehicles have a higher no-claims (no-accident) penalty (old vehicles have the lowest penalty) and there’s a linear-ish relationship between age and no-claims penalty (younger folk get the higher penalty).
Once you’re done exploring the visualisations you can click on one - it’ll augment your DataFrame with the code necessary to make a stand-alone visualisation in a .exported
attribute. This is very nice - no need to guess at how you can recreate the plot, you just ask for the code that you need!
When I teach mu Success course I’m always talking about automating discovery to avoid missing interesting things - this tools like like another one for my toolkit. Have you tried it? Do you have other recommendations that help expose useful relationships?
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
Footnotes
See recent issues of this newsletter for a dive back in time.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I’m also on twitter, LinkedIn and GitHub.
Now some jobs…
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
NLP Data Scientist at Shell, Permanent, London
Curious about the role of NLP in the energy transition? Wondering how we can apply NLP to topics such as EV charging, biofuels and green hydrogen? If you are enthusiastic about all things NLP and are based in the UK, come and join us. We have an exciting position for an NLP Data Scientist to join Shell’s AI organization. Our team works on several projects across Shell’s businesses focusing on developing end-to-end NLP solutions.
As an NLP Data Scientist, you will work hands-on across project teams focusing on research and implementation of NLP models. We offer a friendly and inclusive atmosphere, time to work on creative side projects and run a biweekly NLP reading group.
- Rate:
- Location: London (Hybrid)
- Contact: merce.ricart@shell.com (please mention this list when you get in touch)
- Side reading: link
Data Scientist at EDF Energy, Permanent
If you’re an experienced Data Scientist looking for their next challenge, then we have an exciting opportunity for you. You’ll be joining a team who are striving to build a world class Data Centre of excellence helping Britain achieve Net Zero and delivering value across the customers business.
If you’re a self-starter and someone who has hands on experience with deploying data science and machine models in a commercial environment, then this is the perfect role for you.
You will also be committed to building an inclusive, diverse and value focussed culture within the Data & CRM team with a dedication to lead by example and act as a mentor for the junior members within the team.
- Rate:
- Location: Remote with occasional travel to our offices in Croydon or Hove
- Contact: gavin.hurley@edfenergy.com (please mention this list when you get in touch)
- Side reading: link
Principal Consultant at Semantic Partners
Semantic Partners are experiencing significant demand for people with semantic technology skills, and to this end we are looking to hire around 50 people over the next 2 years. Ideally you have some practical Knowledge Graph experience but for those looking to get into Semantics, we are offering the chance to cross train into the technology skills listed below and offer full product training across and into several vendor graph products. About you - fast learner, critical thinking, requirements capture, logical reasoning, conceptual modelling, investigation, Engineering; Python/Java/C#/Javascript, HTML, CSS etc, Preferred skills - SQL, API design, HTTP System architecture
- Rate: Competitive
- Location: Remote
- Contact: Dan Collier dan.collier@semanticpartners.com (please mention this list when you get in touch)
- Side reading: link
Cloud Engineer (Python) - Anglo American
We are a new team at Anglo American (a large mining and minerals company), working on image data at scale. There are several problems with how things are currently, including: storage limited to local hard drives, compute limited to desktops, data silos and difficulties in finding and sharing image data.
The solution that you will help build will use cloud and web technologies to store, search, visualize and run compute on global scale image archives (Terabyte to Petabyte). Your focus will be on using cloud technology to scale up capabilities (e.g. storage, search and compute). You will be building and orchestrating cloud services including serverless APIs, large databases, storage accounts, Kubernetes clusters and more, all using an Infrastructure as Code approach. We work on the Microsoft Azure cloud platform, and are building on top of open-source tools and open-standards such as the Spatio-Temporal Asset Catalog, webmap tiling services such as Titiler, and the Dask parallel processing framework.
- Rate: Competitive day rate
- Location: Remote (UTC +/- 2)
- Contact: samuel.murphy@angloamerican.com (please mention this list when you get in touch)
- Side reading: link
Full Stack developer (Python & React) at Anglo American
We are a new team at Anglo American (a large mining and minerals company), working on image data at scale. There are several problems with how things are currently, including: storage limited to local hard drives, compute limited to desktops, data silos and difficulties in finding and sharing image data.
The solution that you will help build will use cloud and web technologies to store, search, visualize and run compute on global scale image archives (Terabyte to Petabyte). Your focus will be full stack web development, building back-end APIs and front-end interfaces for users to easily access these large image archives. We will be building on open tools and standards written in Python (such as the Spatio-Temporal Asset Catalog, Titiler and Dask), and you will be extending and modifying these, as well as writing new serverless APIs in Python. Front-end development will be within the context of Anglo American frameworks, primarily using React, and will involve map visualization tools such as Leaflet and/or OpenLayers.
- Rate: Competitive day rate
- Location: Remote (UTC +/- 2)
- Contact: samuel.murphy@angloamerican.com (please mention this list when you get in touch)
- Side reading: link
Senior Data Scientist, Experience Team at Spotify
We are looking for a Senior Data Scientist to join our Experience insights team to help us drive and support evidence-based design and product decisions throughout Spotify’s product development process. As part of our team, you will study user behaviour, strategic initiatives, product features and more, bringing data and insights into every decision we make.
What you will do: 1) Co-operate with cross-functional teams of data scientists, user researchers, product managers, designers and engineers who are passionate about our consumer experience. 2) Perform analysis on large sets of data to extract impactful insights on user behaviour that will help drive product and design decisions. 3) Communicate insights and recommendations to stakeholders across Spotify. 4) Be a key partner in our work to build out our product strategy so that we are relevant in the daily lives of consumers.
- Rate:
- Location: Covent Garden, London or Remote within the EMEA region
- Contact: annabeller@spotify.com (please mention this list when you get in touch)
- Side reading: link, link
Backend & Data Engineer
At Good With, you’ll work at the heart of a dynamic multidisciplinary agile team to develop a platform and infrastructure connecting a voice-enabled intelligent mobile app, financial OpenBanking data sources, state of the art intelligent analytics and real-time recommendation engine to deliver personalised financial guidance to young and vulnerable adults.
As a founding member, you’ll get share options in an innovative business, supported by Innovate UK, Oxford Innovation and SETsquared accelerator, with ambitions and roadmap to scale internationally.
Supported by Advisors: Cambridge / FinHealthTech, Paypal/Venmo & Robinhood Brand Exec, Fintech4Good CTO & cxpartners CEO.
Working with: EPIC e-health programme for financial wellbeing & ICO Sandbox for ‘user always owns data’ approaches.
• Rate: £50-65K + Share Options • Location: Flexible, remote working. Cornwall HQ
- Rate: £50-65K
- Location: Remote
- Contact: gabriela@goodwith.co (please mention this list when you get in touch)
- Side reading: link
Senior Data Scientist & Data Science Manager & Head of Data Engineering at Infogrid, Permanent, London
Infogrid is helping protect the planet and improve the lives of billions of people by making every building a smart building. Our goal is to be the global provider for connected devices in smart buildings. We already handle millions of events every day from tens of thousands of sensors and we’d like you to help us scale that by an order of magnitude over the coming months.
Sustainability is at our heart; buildings account for 39% of global carbon emissions and we’re creating real solutions to impact this! We are still early in our journey but have already achieved a lot; we raised a successful series A funding round, grew 5x in employee numbers within 12 months, and voted one of the top 10 most flexible places to work.