Expert Briefings at PyDataGlobal, the GIL and DirtyCat
Expert Briefings at PyDataGlobal, the GIL and DirtyCat
Did you know that Holiday Extras are hiring a Senior Data Engineer? See their ad and many more below.
I’m starting to experiment more with this newsletter. Some of you know that I have an infant - wee Kai is now 14 months (and we’re just starting nursery). He’s brill, I want to spend more time with him so - I’m going to work on growing this community.
In the coming months I’m going to experiment more with interviews with people I respect, getting some of the teams who advertise jobs to share educational tips back to you and possibly taking a more opinionated stance on the tools I mention. If you’re game for this journey - I’d like your feedback please. What are you here for? Are you job hunting, skilling-up or just keep an eye on things? What would you like to see more (or less) of? Do you want to contribute something? Oh yes, I’m going to start to add some recurring Sections too.
Thoughts
PyDataGlobal runs in 2 weeks, the schedule is online. Next week, prior to the conference, four of us are running a Briefing session. We run this next week and it is only for ticket holders. If you’d like to meet any of us, hear a 15 minute briefing on the state of the art and then join to ask your questions in discussion - please get a ticket! Please RT this to help spread the word.
- Monday - The State of the Art for Probabilistic Programming, Thomas Weicki (pymc3 author)
- Monday - How To Ensure Responsible Use Of AI With A Real-World Example, Tariq Rashid (PyData long-time speaker, book author)
- Wednesday - The State of Higher Performance Python, yours truly (based on my High Performance Python book)
- Wednesday - Natural Language Processing: Trends, Challenges and Opportunities, Marco Bonzanini (author, PyDataLondon organiser)
I got myself interviewed on the TroubleShooting Agile podcast today on “de-silo-ifying data science teams”. Plugging individuals into the heart of the business was my top tip, I’ll post a link once the interview is up. What’s your top tip for getting an DSci out of the corner and effectively into the business? I’ll share them back here.
I taught another Higher Performance class last week. We got into a good discussion about the Global Interpreter Lock, notably on using Numba’s nogil
option to signal to Python that the GIL needn’t be held. This means you can use joblib
with threads (not processes) to run some CPU intensive code in parallel with no blocking, which saves the spawn time for many processes. This is cool, I hope to have a Pandas speed-up to share as a consequence. Read deeper on the GIL here.
Open source and tools
Do you work on dirty text data for ML? Gael’s DirtyCat has had an update enabling “the SuperVectorizer: easily ingest a (possibly dirty) pandas dataframe in a machine-learning pipeline”, getting your text knocked into shape via Pandas and into scikit-learn is now even easier.
If any of your colleagues ever question “is Python here for the long-term?” - point them at the TIOBE index where Python’s become the number 1 language according to their index. It has been hovering at 3rd place for a long time, finally it broke ahead of both C and Java.
Random
I quite like Tom Forth of The Data City, this tweet is nicely blunt (and the tooling he makes all looks rather good). I’ve never met him, if you like open metropolitan/UK data then he’s worth a look. Also you can see a Roomba fly.
Footnotes
See recent issues of this newsletter for a dive back in time.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I’m also on twitter, LinkedIn and GitHub.
Now some jobs…
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it’ll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
Data Scientist @ Good With
As the founding team data scientist, you’ll develop Good With’s intelligent data analysis and recommendation engines, supporting voice and natural language interaction with users.
Python and open source technologies are the overarching strategic choice for the data processing, analysis, machine learning and recommendation engines.
You’ll work at the heart of a dynamic multidisciplinary agile team to develop a platform and infrastructure connecting a voice-enabled intelligent mobile app, financial OpenBanking data sources, state of the art intelligent analytics and real-time recommendation engine to deliver personalised financial guidance to young and vulnerable adults.
As a founding member, you’ll get shares in an innovative business, supported by Innovate UK and Oxford Innovation, with ambitions and roadmap to scale internationally.
Supported by Advisors: Cambridge / FinHealthTech, Paypal/Venmo & Robinhood Brand Exec, Fintech4Good CTO & cxpartners CEO.
Working with: EPIC e-health programme for financial wellbeing & ICO Sandbox for ‘user always owns data’ approaches.
- Rate: £50-65K + Shares in the company
- Location: Flexible, remote working. Cornwall HQ
- Contact: gabriela@goodwith.co (please mention this list when you get in touch)
- Side reading: link
Researcher in Surrogate-Based Optimisation
The Computational Optimisation Group has a two-year research opening (either pre- or post-doctoral) in surrogate-based optimisation. The role intersects computational optimisation, machine learning, and open-source software.
- Rate: £41,593- £49,210 (postdoctoral); £36,694 - £39,888 (predoctoral)
- Location: South Kensington, London
- Contact: r.misener@imperial.ac.uk (please mention this list when you get in touch)
- Side reading: link, link, link
Senior Python Engineer @ Semantic Partners
Senior Python Engineer - Knowledge Graph project for a major European Bank - Semantic Partners are seeking several skilled engineers with the following skillset - Python, Django, Flask etc, RESTful API’s, CI/CD, Containerisation, Docker, Kubernetes, NoSQL, BDD.
You’ll be joining a project team focusing on building a Knowledge Graph so an interest in Graph technologies and any experience of specific triple store systems would be a big plus, but more important is a desire to get into semantic engineering.
- Rate: 500-650/day
- Location: Remote
- Contact: dan.collier@semanticpartners.com (please mention this list when you get in touch)
- Side reading: link
Senior Data Engineer at Holiday Extras, Permanent, Kent/Remote
We believe that time is precious, so we create products, tech and services that make travel and holidays easy, simple and fun. Our purpose is clear: we offer customers less hassle so they can have more holiday. We are looking for Senior Data Engineers with big ideas. Problem solvers and collaborators who love a challenge and are always striving to improve and grow. You’ll bring everyone along on the journey with you - sharing your knowledge, inspiring others so they can improve too. The Data Team is a small but growing team meaning there’s lots of opportunity for you to get stuck in, help us progress and for you to learn and grow yourself. At an exciting time in our data journey, we’re working hard on clearing down the last of our legacy tech. We’re moving to a modern data stack; Airflow, Google BigQuery and Looker. The team’s work fuels our HEHA! App, enabling us to explode 7 data points into 1000, turning our customers’ trips into holiday experiences. Please visit the links below for further details on the opportunity.
- Rate:
- Location: Kent or Remote (visits once a quarter)
- Contact: stefanie.kennett@holidayextras.com (please mention this list when you get in touch)
- Side reading: link, link, link
Data Scientist & Senior Data Scientist at Overton.io, Permanent, London
Overton is looking for data scientists to join our small, dynamic team. Overton is a young company with big ambitions to help universities, think tanks and NGOs track how their research translates into real world policy, laws and regulations. Our platform allows users to search over 4.5m policy documents and understand how they link to each other, to academic papers and to individual authors. We use a wide range of techniques to clean and enhance our data, including entity recognition and linking, classifiers and document topic extraction as well as heuristic based approaches.
You’ll be helping with everything from developing new product updates to finding new data sources, experimenting with new ways to enrich the data and maintaining our existing pipelines. You will be fluent in Python and have experience with web scraping, machine learning pipelines and data analysis & reporting. Experience with data visualisation and front end development, familiarity with scholarly metadata, bibliometrics and/or knowledge of the academic, think tank or research impact space a bonus. See link for details on how to apply.
- Rate: Competitive, dependent on experience
- Location: Central London
- Contact: jobs@overton.io (please mention this list when you get in touch)
- Side reading: link
Fintech Python developer/data engineer/data scientist at Sikoia Ltd, Permanent/Contract
Sikoia is an ambitious new fintech building a unified data platform and API marketplace for global financial services. Our mission is to make it simpler for fintechs, lenders, and corporates to embed financial innovation and automate their decisioning, from customer onboarding through to risk underwriting.
Our founders are from Softbank, JPMorgan and Experian. With VC funding from EarlyBird and Seedcamp, plus support from top fintech CEO angel investors, we are now building a small, top quality tech/data engineering/data science team. Based remotely or at our office in central London, this is an opportunity to shape our product and technology from the very beginning.
Our tech stack is C# and Python, running in Azure. We are leveraging co-development projects with our first clients to build out our core platform. We have partnerships with UK credit bureaus and Open Banking providers, and are adding financial data vendors from many other countries. If you’re a mid/senior level Python developer or data engineer/data scientist, have fintech or SaaS experience, and are excited about fintech and financial innovation, then join us on this journey.
- Rate: Competitive
- Location: London (Holborn/Soho) or remote
- Contact: stephen@sikoia.com (please mention this list when you get in touch)
- Side reading: link, link
Traffic Simulation Researcher at Vivacity Labs
We are looking for an enthusiastic and talented researcher to help us build the next generation of traffic simulation systems.
You will be researching cutting-edge techniques from the fields of data science, computer science, and software development, and applying them to this domain. Your work will help to make simulation a cost-effective tool which can be used ubiquitously across the mobility ecosystem to solve a broad range of problems in transport planning, scheme design and appraisal, and operational control.
- Rate: £45,000-60,000 p/a
- Location: Kentish Town, London
- Contact: joinus@vivacitylabs.com (please mention this list when you get in touch)
- Side reading: link, link, link
Data Engineer at Tasman Analytics
We are hiring a Data Engineer to join our team to help us set up the components of our client’s data platforms (such as data feeds, data warehouses, ETL infrastructure, etc). You’d also design and develop client-specific SQL data models that produce clean, structured and meaningful data sets for the business and other data functions (using dbt). And on top of that, help us build ETL scripts in python to extract data from APIs or perform pipeline transformations.
You’ll work with some of the most ambitious companies in the D2C startup ecosystem, including Pollen, On Deck, Ecosia, PensionBee, and Curio Labs. We offer a competitive salary (this is a junior role, range 45-55k) and a ton of benefits (enhanced parental leave, generous pension scheme, gym membership, refreshment allowance, home office allowance).
- Rate: 45-55k
- Location: Remote
- Contact: thomas@tasman.ai (please mention this list when you get in touch)
- Side reading: link
Data Engineer/Scientist - Beatchain
Beatchain is a music distribution and social media marketing and analytics platform that works with up-and-coming artists as well as established record labels. We are looking for a data engineer/scientist working in Python to help users manage, understand and put their data into context. Data sources include social media and music platforms scraped over hundreds of thousands of accounts using Scrapy, APIs including over two million Spotify playlists, and large quantities of streaming data from our distribution and record label partners.
This is a junior to mid-level role, you would be working within a small back-end team alongside the lead data-scientist. While the day-to-day ingestion and transformation of data is maintained, we research ways of presenting data to users through visualizations and predictive analytics. Recently, we used graph embeddings to model relationships between artists and genres to recommend related artists for social media campaigns. We use the familiar PyData Python/Pandas/NumPy stack deployed via AWS Lambda, Step Functions and Batch. Data lives in AWS RDS, DynamoDB and Redshift, migrating to Google BigQuery.
- Rate: Up to £40K, subject to experience.
- Location: Central London / Remote from a suitable time-zone
- Contact: ed.godshaw@beatchain.com (please mention this list when you get in touch)
- Side reading: link
Data Scientist (various levels)
Here at Gousto, we are on a mission to become the UK’s favourite way to eat dinner!
We’re hiring for multiple Data Science positions: - Principal Data Scientist - (Menu) https://apply.workable.com/gousto/j/3C7165186A/ - Principal Data Scientist (Supply) https://apply.workable.com/gousto/j/4709837FEC/ - Data Scientist (Growth) https://apply.workable.com/gousto/j/C9F991E124/
If you want to work on some seriously interesting projects and get discounted Gousto boxes as part of the benefits package, please apply using the links above, mentioning that this newsletter sent you there!
See here https://www.gousto.co.uk/jobs for benefits and check out our blog: https://medium.com/gousto-engineering-techbrunch
- Rate:
- Location: London (partly/mostly remote if you wish)
- Contact: marco.gorelli@gousto.co.uk (please mention this list when you get in touch)
Traffic Data Insights Lead at Vivacity Labs, Permanent, London
At Vivacity, we make cities smarter. We gather real-time data from our sensors to reduce congestion, spot dangerous manoeuvres on the road to improve safety, and support autonomous vehicles.
You will join our existing Product team and actively shape the product vision and technical roadmap to ensure we are constantly innovating and meeting our users’ data needs.
- Rate: 55,000-75,000
- Location: Kentish Town, London
- Contact: joinus@vivacitylabs.com (please mention this list when you get in touch)
- Side reading: link, link, link
Python Data Developer at Kindred Group, Permanent, London
Kindred’s ambition is to be the most insight-driven gambling company and in the last few years we’ve invested heavily in our data and analytics capabilities. We are now at the next stage of our journey, embarking on an initiative to enhance our sports and racing modelling and quantitative analysis capabilities.
The Quantitative Team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. This work builds upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are looking for a software engineer with a strong interest in sporting applications and experience in building solutions to handle varied external data sources. On joining, you will be responsible for creating exceptional quality data products, primarily based on sports event and market odds data, for use within the Quantitative Team and the wider business. Your work will be integral in the team’s delivery of market-leading probability and machine learning models to support our commercial and operational functions and decision making processes.
- Rate: Highly Competitive
- Location: Wimbledon, London
- Contact: jack.morrow@kindredgroup.com (please mention this list when you get in touch)
- Side reading: link, link
Quantitative Analyst - Kindred Group, Permanent
Kindred’s ambition is to be the most insight driven gambling company and in the last few years we’ve invested heavily in our data and analytics capabilities. The quantitative team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. The work will build upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are now looking for a talented Quantitative Analyst to join our team to help shape our sport and racing modelling efforts. This role provides an exciting opportunity to be a pivotal part of the team. On joining, you will be responsible for performing data analysis and building probability and machine learning models to derive descriptive and predictive insight about sporting events. Your work will help to deliver market-leading tools and capabilities to support our commercial and operational functions and decision making processes.
- Rate: Highly Competitive
- Location: London, Wimbledon
- Contact: shanice.Tatter@KindredGroup.com (please mention this list when you get in touch)
- Side reading: link, link
Lead Data Scientist - Kindred Group
Kindred Group use data to build solutions that deliver our customers the best possible gaming experience and we have ambitious plans to get smarter in how we use our data. As part of these plans we’re looking to recruit a Lead Data Scientist to drive our advanced analytics initiatives and build innovative solutions using the latest techniques and technologies.
Key Accountabilities
• To lead, manage and deliver our advanced analytics initiatives using cutting edge techniques and technologies to deliver our customers the best online gaming experience.
• Working in cross functional teams to deliver innovative data driven solutions.
• Able to advise on best practises and keep the company abreast of the latest developments in technologies and techniques
• Building machine learning frameworks to drive personalisation and recommendations.
• Building predictive models to support marketing and KYC initiatives.
• Continually improving solutions through fast test and learn cycles
• Analysing a wide range of data sources to identify new business value
• Be a champion for advanced analytics across the business, educating the business about its capability and helping to identify use cases
- Rate: Competitive
- Location: London, Wimbledon
- Contact: Shanice.Tatter@kindredgroup.com (please mention this list when you get in touch)
- Side reading: link, link
Software Engineer IV, Recommenders, Elsevier
Recommenders is Elsevier’s suite of recommendation systems, which uses Data Science and machine learning techniques to keep researchers appraised of developments in their field, new funding opportunities, finding peer reviewers and papers related to their work. We’re looking for a data engineer to help us build the pipelines which extract features from the unparalleled collection of research data flowing through our systems.
You’ll be working in a modern technology stack (AWS, Scala, Spark, Kafka, we’re currently looking at SageMaker and Kedro) as part of a small cross-functional team. If you’re interested in learning more, please contact Stuart White at the email address below.
- Rate:
- Location: London
- Contact: s.white.1@elsevier.com (please mention this list when you get in touch) (please mention this list when you get in touch)
- Side reading: link