Lessones learned at the Executives at PyData discussion session from PyDataLondon 2023
Lessones learned at the Executives at PyData discussion session from PyDataLondon 2023
Further below are 3 job roles including Senior roles in DS and generative DNNs at organisations like QualisFlow, Jaguar Landrover and Gradient Labs
This issue is rather delayed. I've switched gears, taking time to focus on other projects for a few months including a charity banger-car driving event (MotoScape Car Rally - below) before tackling the new RebelAI leadership community that I've mentioned in past issues.
The car below is our chosen "banger" for the charity car drive this September (this is the one I didn't blow up), details below if you want to hear a silly story.
A month back we had our PyDataLondon conference, it was great to see so many people IRL again. I need to write-up the talk that Giles Weaver and I gave on Pandas 2 vs Polars vs Dask, we had some great conversations around that. That should come for the next issue.
At the conference I ran another of my Executivates at PyData discussion sessions for leaders, I've got notes on that below.
Our PyDataLondon meetup is looking for new speakers - would you like to give a lightning talk (5 minutes) or a main talk (30 minutes) to 100+ people? Submit a talk - we'd especially like to hear from you if you previously submitted a talk to the conference that didn't get accepted (sorry about that - demand is always high for a small number of speaking slots). We'd love to have you talk at the meetup!
High Performance Python 2nd ed (current) and 3rd ed (next year?)
I'm very pleased to say that we've had yet another international edition published - this time High Performance Python 2nd ed Japanese, on top of Korean, Polish and Italian.
Micha and I have just started to talk about the 3rd edition (for next year), perhaps to add Dask and Polars, more Pandas, probably a deeper dive into optimising DNNs. What would you like to see in the next edition?
PyDataLondon
At the conference I ran another of my Executives at PyData discussion sessions, aimed at team leaders who have challenges to solve. As usual I solicit questions from the room, we take a vote and then we discuss the top-voted - here's the list we started with:
One interesting change was that last year we had a huge need to discuss "the challenge of remote" and that spilled over into a second hour's discussion. This year almost nobody wanted to talk about it! It feels like a far more "solved" issue now. This year instead we spilled over for a second hour discussing strategic challenges to making DS projects run and deliver success.
Enabling a good data science ecosystem (tactical) and building the foundations for great projects (strategic)
John opened by noting that selling the details of a project to Execs is hard, Bene countered to just sell the strategic benefit to the Execs and keep the details out of the discussion (I'd agree!). Puja talked of selling the successful end result and balancing that with the acceptance that if you give yourself enough rope you have to own your mistakes - and you must also not be bullied by Execs who try to push projects in new directions.
Steve brought up a critical question - what actually is success? If it isn't well-defined, you'll have a heck of a hard time setting up an environment that enables it. Liev suggested that a pre mortem (wp) looking at what-could-fail-and-why helps the team to think through what's likely to be tricky, so they can design against these issues.
Sema noted that in mature DS teams where targets can be well defined, setting up cross-functional teams coupled with a Product Manager who understands their KPIs helps to keep discussions sane and lets new projects have sensible baseline expectations.
Nel noted that writing stuff down is underappreciated (I totally agree - I quite like "living documents" when doing things like project specifications). Getting teams to write stuff down, then refer back to it, keeps everyone on board whilst the specifications and understandings evolve as the project unfolds.
Gabriel discussed the need to have both a bottom-up drive to run new projects and a top-down drive to make delivering new value possible. Raphael talked about a new team whose sole job was to unlock procedures that were otherwise hampering the delivery of value and that helped the DS team get issues unblocked so projects got shipped and delivered value. There's so much more to DS than "getting the data and applying the algorithm"!
The big message seemed to be that whilst Execs want a shiny-happy-picture (and hopefully a guaranteed result), you need people who understand the risks inherent in that line of business plus a papertrail to keep everyone accountable to what's agreed.
Boring technology vs the new and shiny - getting the team to focus
The challenge posed by M. was in having 30k lines of code over 5 live projects with oversight by external agencies, and keeping the team focused on "reliable boring technology" rather than chasing new tools and ideas.
That's a tricky one as "keeping the lights on" is essential (especially when subject to oversight), but some level of play is typically sought by problem-solving folk (and often DS folk are problem-solvers). Nel noted that there's a lot of value in legacy systems in large organisations, so keeping the team focused is key.
James suggested keeping the teams aware of the ultimate business value behind the current work (it is easy to lose sight of why we're doing clever tech projects).
Michael says they help juniors 1/2 day a week to work on a personal project with a generous GPU allowance (e.g. for LLM experimentation) to balance the majority-time need to stay focused on the core systems.
The theme definitely focused on the need to deliver useful value on the boring systems, with a need to balance some "fun" activities for staff (e.g. self-driven play) to satisfy that core need to learn new things.
I've had to discuss this theme with Execs in the past who want DS teams to work "like other teams in the org - show up, do the work, go home". And so clearly that's not how many DS folk are motivated.
Recruitment challenges for DS teams
This discussion focused more on making new hires quick effective, and not losing existing staff if possible.
Gabriel noted that hiring smart engineers and supporting them to learn enough data science was typically more productive if you didn't need cutting-edge science but you did need deployable products that make an impact. I've come across this opinion increasingly as teams mature - there's so much low hanging fruit, if you've got people who know how to get the data and deploy a thing that works.
This led to a discussion about labelling jobs "applied DS" vs "research DS", where "applied" focuses on business value. Labelling jobs as such helps people to self-select the right role to put themselves forwards for.
Cherry reminded everyone that some candidates will be looking for more than a salary and a work-package and they'll be looking for things like engagement in the wider community and being able to talk at events and contribute to open source projects.
Steve emphasised the need for good engineering practices (that's why a teach a course on software engineering!) else people get frustrated and leave due to a lack of impact. Hackathons every quarter let people play with new ideas and it lets Exec look at the new ideas which might in turn be promoted into active projects for the company.
Ludmilla noted that pairing a junior+senior helps everyone collaborate, Steve added that sometimes you need to pick those who "want to pair" so those under pressure to ship don't feel they've got a time-distraction dumped on them. Personally I think pairing - some of the time - is an underappreciated high-impact activity (I've found it draining, but I've definitely had >2x output). Steve also noted that accounting for pairing in the training budget avoids discussions about "wasted time".
A couple of people noted that a lot of people are happy to jump between roles to new organisations as compensation is easy to chase if you're experienced - and then people forget that it costs 3x a salary to get a new hire trained up and productive (thinking about lost time on current projects, the distraction of training a new hire, hoping that hire makes it through probation!). Execs need reminding to be sensible about compensation, else they risk expensive churn in the current environment. Even after a year of experience a 20-30% pay bump was noted as "common".
Conclusion
Overall we had a 2 hour discussion on a set of critical topics by a strong group of leaders - I really enjoy running these sessions in-person at the conference. I thank everyone who attended. I run strategic consulting with teams on the topic of "success", feel free to reply to this if that might be useful in your team.
MotoScape Car Rally - charity banger run to Venice in September
So two buddies (hi Ed! hi Rob!) and I are doing a charity car drive in September, driving from London to Venice and back in a week on an organised car trip. All funds raised go to Parkinson's Research, we've funded buying the car ourselves. I'm going to ask you to make a donation if you've found value in my newsletter, my public talks or in the PyData meetups and conferences I've helped organise. You don't have to but...it would be nice.
Presumably if you've been reading this newsletter for a while you've found some value, right?
Why? Being a new parent during lock-down (no parent/toddler groups, no cafes, and prior to vaccines - it was hard even to ask for family support) was hard for my wife and I. Having boot-strapped a perfectly solid and happy little walking neural network, I figure it is time to start doing some silly endeavours not related to work (being housebound was madness-inducing).
A charity car drive, with a stipulation that you need to buy a car costing under £1,000, seemed like a silly thing to do. So we're doing it in the first week of September with 30+ other teams, everyone is self-funding and raising money for charity.
This car is a Volkswagen Passat '99, so 24 years old, costing £650, from the picture you'll notice that it has dropped-suspension and, if you check the justgiving link, you'll see the big-boy-double-exhaust. These modifications were made by the last-last owner. They add character. We just have to go slow over speed-bumps.
Due to the dropped suspension we've gone with the theme "Hawaii Five-Low". Yesterday we sourced some surfboards, I've got to go collect a roof-rack from West London for them. You'll get the impression that this is a labour of love and silliness.
If you want to see details of the car I blew up - our first car - you should click the link, you'll need to "show more" at the base of the page to see the Volvo V50 which was our first chosen car. I'll explain that in a later issue. You don't want to get caught in a car during a turbo runaway event, that's something I learned.
Our second car, the shiny silver Volkswagen Passat, is now running fine (excepting the broken rocker cover, the broken dipstick, the initially-illegal tyres, the missing air filter and other such fun). I'll share more updates in the next issue:
Our JustGiving page has the details. Please - make a donation if any of my public work has been helpful to you, I'd much appreciate it.
Open source - a richer describe
in Pandas
In recent issues I've had a new-style open source section covering tools like sktime
, featureengine
, dtreeviz
and spacy
. Atharva has been contributing these for me. Was it useful? I'm happy to talk to Atharva about doing more, if they're valuable. Did you find something you didn't know before? For this issue I've just got a small update from some tools I noticed recently.
A little while back skimpy was brought up in conversation - a "light" pandas profiling
(which has changed name to ydata-profiling). skimpy
gives a text-only summary of a much richer df.describe()
, whilst ydata-profiling
does a full-on univariate and correlation analysis with rich graphical output. I think it is probably useful if you want a richer describe
. Current it is pinned to Pandas ~= 1.3, not the new 2.0 release (I've requested a new release). ydata-profiling
however recently got updated for Pandas 2.0.
Whilst digging into this I read the skimpy
pyproject.toml and was surprised to see package name requirements like pandas = "^1.3.2"
and it took a few minutes of googling to realise that this is a Poetry PyProject SemVer configuration option (which seems to go beyond requirements.txt
) allowing ">= 1.3.2 but < 2.0.0". I don't recall ever seeing this before, but I don't use Poetry. I continue to feel behind the times when it comes to Python package requirements, with only 20 years experience around it.
Footnotes
See recent issues of this newsletter for a dive back in time. Subscribe via the NotANumber site.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I'm also on twitter, LinkedIn and GitHub.
Now some jobs…
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,600+ subscribers. Your first job listing is free and it'll go to all 1,600 subscribers 3 times over 6 weeks, subsequent posts are charged.
Senior Software Engineer at Qualis Flow Limited
Your team and your role
We’re looking for someone that will be responsible for designing and developing the software that powers our products. You’ll need to collaborate with other teams, write high-quality code and ensure the codebase follows best practices.
You’ll be working in our Engineering team, working closely with Product and other technical teams and reporting to the team lead.
- Design, develop, and maintain the core engine that powers our products, ensuring scalability, performance, and reliability
- Write high-quality, maintainable code that is well-documented and tested. We are a fan of TDD
- Deeply collaborate with other engineers. We are fans of promiscuous pair-programming and mob programming
- Ensure the codebase follows best practices for software development, such as using appropriate design patterns, writing clean and modular code, and ensuring that the codebase is easy to understand and maintain
- Continuously improve our development processes and technologies to ensure high-quality software delivery
- Participate in code reviews and provide feedback to other engineers on their code
- Work closely with the Product team to translate product requirements into technical specifications
- Collaborate with other internal teams to develop software that meets the needs of the business and our customers
- Contribute to the technical direction of the team and provide ideas and input on architectural decisions
- Provide technical guidance and mentorship to more junior members of the team
- Always have an eye on the big picture to avoid getting lost in the weeds
Your skills
- Strong experience in an engineering environment, with a focus on building complex, scalable, and performant systems
- Expertise in Python
- Having NLP or Image processing experience is a bonus, but not mandatory
- Experience with modern AI applications, prompt engineering and LLMs is a bonus, but not mandatory
- Previous backend development experience beneficial
- Strong knowledge of Cloud architecture, instrumentation, and CI/CD and git (not just commit and push 😀)
- Excellent communication and interpersonal skills, with the ability to convey complex ideas and data to diverse audiences
-
Proven experience in guiding and mentoring other engineers
-
Rate: 80-90k
- Location: Hybrid, Office at London Bridge
- Contact: sam.joseph@qualisflow.com (please mention this list when you get in touch)
- Side reading: link, link, link
Analytics & Digital Data Scientist at JLR, Permanent, Gaydon UK
This role sits within the new JLR Supply Chain Analytics Team – an embedded cross-disciplinary group of data professionals who move fast using the latest technologies, sharing best practices with colleagues in JLR central digital. In this delivery role you’ll be part of a fun, inclusive, learning, flexible, value focussed team backed by growing supply chain domain expertise and fearless in the face of a thorny technical problem or highly complex data. You will assist in the design and delivery of transformational Digital products to help streamline the way we work, enabling us to meet future vehicle sales volumes and customer delivery expectations in this unprecedented era of global supply chain disruption.
You will use your mathematical skills and scientific approach to help with the formulation and implementation of our Digital strategy for Planning, particularly the use of Optimisation and forecasting models to guide planning scenarios. You will use your technical knowledge to translate the business question into the correct mathematical formulation and technically implement it, assisting the business with interpreting and diagnosing the output and proactively suggesting improvements.
- Rate:
- Location: Gaydon (Midlands) UK, hybrid
- Contact: aburrel1@jaguarlandrover.com or apply via the link (please mention this list when you get in touch)
- Side reading: link, link
3-6 month Applied AI Residency at Gradient Labs
Gradient Labs is a new AI startup in London. We’re launching an AI Residency Program: we’re looking for an early-career AI researcher to join us for 3-6 months to experience what day 0 at an AI startup in London is like.
We’re building a suite of LLM-based agents that can safely automate customer support conversations in complex environments. You’ll be working directly with a small team— us (me, Danai Antoniou, Dimitri Masin) and any other early hires we make. Join us to spent your time researching, prototyping, optimising, and analysing what our agents can or should do. Help us build that functionality and evaluate how well it works.
- Rate: Approx £60k
- Location: London (in person 1-2 days per week)
- Contact: neal@gradient-labs.ai, dimitri@gradient-labs.ai (please mention this list when you get in touch)
- Side reading: link