Learning to estimate, supporting open source creators, upcoming PyData conferences
Thoughts
Given some of the big events happening in the world I’ve focused more of my time on “estimation”, I’ve made notes on this and the Metaculus site below. I’ve added a brief word on the Ukraine (in short - if you want to do something useful, make a donation to a reputable organisation). I’ve noted Simon Willison’s suggestion to help open source maintainers get support from companies and list the PyDataLondon and PyDataBerlin conference dates at the end.
There are 8 roles further below including Senior DS, data engineers, Principal roles and more.
Ukraine
Generally I prefer keeping my newsletter (and public talks) non-political. In this case I’m going to state that I do not support Russia’s war and I’ll hope for a quick conclusion (preferably one that keeps the Ukraine Ukrainian). I’ve donated already and will likely donate again. If you haven’t been involved, please consider donating (to an aid agency or for more direct support). Money or other direct support is far more useful than some RTs.
Good process - estimating
In past issues I’ve mentioned the Metaculus predictions site, there’s a wide set of tasks requiring accurate estimates and I take part in some. There’s no way to bet money on the outcomes as it isn’t an exchange (note some competitions have prize money but I’m ignoring that), mostly you just have a score and solid discussion. Estimating under high uncertainty and without much evidence is both an interesting intellectual challenge, a valuable skill to build and can result in useful outcomes for others to use.
One challenge I participate in is “Will 5M Ukrainian’s become refuges during 2022?”. By thinking through the evidence and making my own estimate of likelihood I realise that I have better-formed arguments in my backpocket for when a politician gives their soundbite. By integrating multiple pieces of evidence an estimate is synthesised which could be used by aid agencies or governments, in addition to the estimates from traditional sources.
There’s a bunch of challenges around key Ukranian events. A lot hinge on evidence for Russia’s advance and that’s not something that a beligerent force tends to publicise in detail. It turns out however that the Oryx crowd-sourced site is doing just this - taking credible photographic evidence from social media and estimating losses (for Russia). These numbers have been checked by others.
The bit that boggles my mind is that “yet another source of ‘pretty good’ data” appears where we didn’t expect to see one, and I don’t have to be a large government to get the same kinds of counts that a General might receive. Every year more and more data becomes open and quickly available. Whilst a skill like machine learning is useful, there’s a growing role for anyone who can take disparate information and integrate it against sensible questions in a timely fashion.
If your estimation skills aren’t so hot and you’d like to improve them, there’s a really nice (and chilled) tutorial on the site. You might then dip into “Will 2022 be the hottest on record” and see trivial (but not silly), less trivial and then more involved estimation methods.
I’m particularly interested in the green-power challenges, watching events unfold to decrease the chance that a German Nuclear Power returns to operation was sobering. Trying to estimate US Electrice Vehicle Sales in 2022 with scant evidence is also pretty tricky.
Figuring out how to estimate timelines, costs and complexity for projects is hard and is a skill that many of us - managers included - are poor at. Here’s a site that can help you learn that skill whilst becoming more involved in significant events. Maybe it is worth a bit of your time.
How did you improve your estimation skills? I’d happily share a tip back here if you reply to this.
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
Open Source
Simon Willison proposes that OS maintainers add a consulting.md file to their repo offering “Invite me to give a 1hr q&a session for $X” as an easy way for corporates to support an open source project. No preparation is needed and it is an easy sign-off - $1000 gets easy approval for an hour with the maintainer of a tool that’s important to the team.
I think the idea has merit. I’m not sure that my projects are popular enough to get much visibility with this approach but I know from my own work that that giving a round number (I’d go far-north of $1k) for some of your time in exchange for a presentation and q&a with a motivated group is an easy sign-off for a lot of organisations. Maybe this is something you can try, especially if you’ve got budget and think you could learn useful techniques from an open source developer for your team?
Conferences
We’ve got PyDataBerlin coming for April 11-13 as a part of PyConDE and PyDataLondon for June 17-19 at the usual location (Tower Hotel, not far from London Bridge station). For PyDataLondon we’re arranging a smaller capacity to enable more distancing, the Call for Proposals is open for both.
PyDataLondon will be an in-person (not hybrid) conference and I believe that PyDataBerlin is doing the same. It is both exciting and a bit scary (well, for me) to think about being around a lot of other people again. I’ve had a child during lockdown and practiced very-safe distancing, to the extent that my wife and I went a bit nuts and had to remember to re-engage with the wider world to save our heads.
If you have questions about PyDataLondon - particularly about sponsoring, you can reply to me directly. I’m a part of the organising group and if you want to meet smart people to hire, sponsoring is a very sensible idea. Hit me up if you’d like to discuss what that means. I’m likely to help organise the “Execs at PyData” session again, aimed at managers visiting PyData, probably I’ll also setup another pre-conference Briefings event on state of the art topics as we had for PyDataGlobal. Is there anything you’d like to see?
If you could retweet this announce it would be appreciated. All money raised from our volunteer-run conferences goes back to NumFOCUS who support our PyData core packages and much more - see supported projects here.
If you find this useful please share this issue or archive on Twitter, LinkedIn and your communities - the more folk I get here sharing interesting tips, the more I can share back to you.
Footnotes
See recent issues of this newsletter for a dive back in time.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I’m also on twitter, LinkedIn and GitHub.
Now some jobs…
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
Senior Data Scientist at Caterpillar, permanent, Peterborough
Caterpillar is the world’s leading manufacturer of construction and mining equipment, diesel and natural gas engines, industrial gas turbines and diesel-electric locomotives. Data is at the core of our business at Caterpillar, and there are many roles and opportunities in Data Science field. The Industrial Power Systems Division of Caterpillar currently has an opportunity for a Senior Data Scientist to support power system product development engineers with data insights, and to develop digital solutions for our customers to maximise the value they get from their equipment through condition monitoring.
As a Senior Data Scientist, you will work across, and lead, project teams to implement analytical models and data insights on a variety of telemetry and test data sources, in a mechanical product development environment.
- Rate: £50,000 to £55,000 (depending on experience) with up to 12% bonus
- Location: Peterborough (flexible working considered)
- Contact: sheehan_dan@cat.com (please mention this list when you get in touch)
- Side reading: link, link
NLP Data Scientist at Shell, Permanent, London
Curious about the role of NLP in the energy transition? Wondering how we can apply NLP to topics such as EV charging, biofuels and green hydrogen? If you are enthusiastic about all things NLP and are based in the UK, come and join us. We have an exciting position for an NLP Data Scientist to join Shell’s AI organization. Our team works on several projects across Shell’s businesses focusing on developing end-to-end NLP solutions.
As an NLP Data Scientist, you will work hands-on across project teams focusing on research and implementation of NLP models. We offer a friendly and inclusive atmosphere, time to work on creative side projects and run a biweekly NLP reading group.
- Rate:
- Location: London (Hybrid)
- Contact: merce.ricart@shell.com (please mention this list when you get in touch)
- Side reading: link
Data Scientist at EDF Energy, Permanent
If you’re an experienced Data Scientist looking for their next challenge, then we have an exciting opportunity for you. You’ll be joining a team who are striving to build a world class Data Centre of excellence helping Britain achieve Net Zero and delivering value across the customers business.
If you’re a self-starter and someone who has hands on experience with deploying data science and machine models in a commercial environment, then this is the perfect role for you.
You will also be committed to building an inclusive, diverse and value focussed culture within the Data & CRM team with a dedication to lead by example and act as a mentor for the junior members within the team.
- Rate:
- Location: Remote with occasional travel to our offices in Croydon or Hove
- Contact: gavin.hurley@edfenergy.com (please mention this list when you get in touch)
- Side reading: link
Principal Consultant at Semantic Partners
Semantic Partners are experiencing significant demand for people with semantic technology skills, and to this end we are looking to hire around 50 people over the next 2 years. Ideally you have some practical Knowledge Graph experience but for those looking to get into Semantics, we are offering the chance to cross train into the technology skills listed below and offer full product training across and into several vendor graph products. About you - fast learner, critical thinking, requirements capture, logical reasoning, conceptual modelling, investigation, Engineering; Python/Java/C#/Javascript, HTML, CSS etc, Preferred skills - SQL, API design, HTTP System architecture
- Rate: Competitive
- Location: Remote
- Contact: Dan Collier dan.collier@semanticpartners.com (please mention this list when you get in touch)
- Side reading: link
Cloud Engineer (Python) - Anglo American
We are a new team at Anglo American (a large mining and minerals company), working on image data at scale. There are several problems with how things are currently, including: storage limited to local hard drives, compute limited to desktops, data silos and difficulties in finding and sharing image data.
The solution that you will help build will use cloud and web technologies to store, search, visualize and run compute on global scale image archives (Terabyte to Petabyte). Your focus will be on using cloud technology to scale up capabilities (e.g. storage, search and compute). You will be building and orchestrating cloud services including serverless APIs, large databases, storage accounts, Kubernetes clusters and more, all using an Infrastructure as Code approach. We work on the Microsoft Azure cloud platform, and are building on top of open-source tools and open-standards such as the Spatio-Temporal Asset Catalog, webmap tiling services such as Titiler, and the Dask parallel processing framework.
- Rate: Competitive day rate
- Location: Remote (UTC +/- 2)
- Contact: samuel.murphy@angloamerican.com (please mention this list when you get in touch)
- Side reading: link
Full Stack developer (Python & React) at Anglo American
We are a new team at Anglo American (a large mining and minerals company), working on image data at scale. There are several problems with how things are currently, including: storage limited to local hard drives, compute limited to desktops, data silos and difficulties in finding and sharing image data.
The solution that you will help build will use cloud and web technologies to store, search, visualize and run compute on global scale image archives (Terabyte to Petabyte). Your focus will be full stack web development, building back-end APIs and front-end interfaces for users to easily access these large image archives. We will be building on open tools and standards written in Python (such as the Spatio-Temporal Asset Catalog, Titiler and Dask), and you will be extending and modifying these, as well as writing new serverless APIs in Python. Front-end development will be within the context of Anglo American frameworks, primarily using React, and will involve map visualization tools such as Leaflet and/or OpenLayers.
- Rate: Competitive day rate
- Location: Remote (UTC +/- 2)
- Contact: samuel.murphy@angloamerican.com (please mention this list when you get in touch)
- Side reading: link
Senior Data Scientist, Experience Team at Spotify
We are looking for a Senior Data Scientist to join our Experience insights team to help us drive and support evidence-based design and product decisions throughout Spotify’s product development process. As part of our team, you will study user behaviour, strategic initiatives, product features and more, bringing data and insights into every decision we make.
What you will do: 1) Co-operate with cross-functional teams of data scientists, user researchers, product managers, designers and engineers who are passionate about our consumer experience. 2) Perform analysis on large sets of data to extract impactful insights on user behaviour that will help drive product and design decisions. 3) Communicate insights and recommendations to stakeholders across Spotify. 4) Be a key partner in our work to build out our product strategy so that we are relevant in the daily lives of consumers.
- Rate:
- Location: Covent Garden, London or Remote within the EMEA region
- Contact: annabeller@spotify.com (please mention this list when you get in touch)
- Side reading: link, link
Backend & Data Engineer
At Good With, you’ll work at the heart of a dynamic multidisciplinary agile team to develop a platform and infrastructure connecting a voice-enabled intelligent mobile app, financial OpenBanking data sources, state of the art intelligent analytics and real-time recommendation engine to deliver personalised financial guidance to young and vulnerable adults.
As a founding member, you’ll get share options in an innovative business, supported by Innovate UK, Oxford Innovation and SETsquared accelerator, with ambitions and roadmap to scale internationally.
Supported by Advisors: Cambridge / FinHealthTech, Paypal/Venmo & Robinhood Brand Exec, Fintech4Good CTO & cxpartners CEO.
Working with: EPIC e-health programme for financial wellbeing & ICO Sandbox for ‘user always owns data’ approaches.
• Rate: £50-65K + Share Options • Location: Flexible, remote working. Cornwall HQ
- Rate: £50-65K
- Location: Remote
- Contact: gabriela@goodwith.co (please mention this list when you get in touch)
- Side reading: link