Loading parquet data quickly, James Powell's training, Human-Learn
Loading parquet data quickly, James Powell's training, Human-Learn
Did you now that Kindred are looking for a Lead Data Scientist along with quants and python developers? Details are below along with other Senior roles in 14 job adverts.
Last week I ran my last public Higher Performance Python training for the summer. We covered profiling and spoke a bunch about getting Pandas to go faster and stretching beyond plain Pandas with Dask and Vaex. I'm actually using a bunch of these ideas on a current Kaggle competition (Optiver Volatility) as we're limited to a 16GB dual core VM which is painfully slow. I'll report more on "what is working well" in the future.
One trick I've learned is that Pandas' read_parquet is substantially more memory efficient if you pass the path=
argument (at least when using pyarrow
, I haven't tested fastparquet
) to read a set of partitions in 1 go vs reading them iteratively and glueing them together with a concat
. A correctly-sized DataFrame is built ahead of time, rather than needing temporaries which increase RAM usage if you iterate. It also works out to be faster to load data this way too. Sometimes it is the little things that add up. Given that this dataset fitted into my laptop's RAM I was not surprised to see that the overhead of trying Dask negated any benefit of parallelism.
As a part of this, and whilst playing with new Pandas tools for a future intermediate-focused Pandas course, I've started to use pandas-vet which runs a set of sanity checks through flake8 to spot less-desirable Pandas code with solutions. By reducing the breadth of the Pandas API a little I can focus on deepening my knowledge around key APIs (rather than "use all of them").
James powell has some upcoming training seminars on core Python topics, he's shared me both some discount codes and some free session codes (both first-come, first-served). The first of three events runs at 7 AM US/EDT, 12 PM GMT (he's based in New York) where "the focus is on writing “Precision Python”: the tightest, most effective, readable, and expressive approach to solving a problem". You can read about it here.
This link gets you 20% off and if you're really quick this link gets you a free ticket. If you get a free ticket - James would appreciate a tweet (his twitter is above) of thanks. Be nice and don't take the limited free tickets if you're not sure you can attend. Note that if you have any ticket problems - please contact James via the Eventbrite page (I'm just sharing the links for him).
The Call for Proposals for PyDataGlobal 2021 is open for another 10 days, if you'd like a chance to speak then get your talk idea in.
Do you have a library to share with 1,400 readers? Vincent Warmerdam has shared his human-learn ML toolkit for sklearn:
"Many classification problems can be done by natural intelligence too! This package aims to make it easier to turn the act of exploratory data analysis into a well-understood model. These “human” models are very explainable from the start. No need for explainability tools in hindsight. If nothing else, these models can serve as a simple benchmark representing domain knowledge which is a great starting point for any predictive project.
Human-Learn
contains scikit-learn
compatible tools that make it easier to construct and benchmark rule-based systems designed by humans. There are tools to turn Python functions into grid-searchable scikit-learn compatible components and interactive jupyter widgets that allow the user to draw models. One can also use these tools to design rules on top of existing models that, for example, can trigger a classifier fallback when outliers are detected."
See recent issues of this newsletter for a dive back in time.
About Ian Ozsvald - author of High Performance Python (2nd edition), trainer for Higher Performance Python, Successful Data Science Projects and Software Engineering for Data Scientists, team coach and strategic advisor. I'm also on twitter, LinkedIn and GitHub.
Now some jobs…
Jobs are provided by readers, if you’re growing your team then reply to this and we can add a relevant job here. This list has 1,400+ subscribers. Your first job listing is free and it'll go to all 1,400 subscribers 3 times over 6 weeks, subsequent posts are charged.
Python Data Developer at Kindred Group, Permanent, London
Kindred's ambition is to be the most insight-driven gambling company and in the last few years we've invested heavily in our data and analytics capabilities. We are now at the next stage of our journey, embarking on an initiative to enhance our sports and racing modelling and quantitative analysis capabilities.
The Quantitative Team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. This work builds upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are looking for a software engineer with a strong interest in sporting applications and experience in building solutions to handle varied external data sources. On joining, you will be responsible for creating exceptional quality data products, primarily based on sports event and market odds data, for use within the Quantitative Team and the wider business. Your work will be integral in the team's delivery of market-leading probability and machine learning models to support our commercial and operational functions and decision making processes.
- Rate: Highly Competitive
- Location: Wimbledon, London
- Contact: jack.morrow@kindredgroup.com (please mention this list when you get in touch)
- Side reading: link, link
Quantitative Analyst - Kindred Group, Permanent
Kindred's ambition is to be the most insight driven gambling company and in the last few years we've invested heavily in our data and analytics capabilities. The quantitative team work closely with the existing data science function to play an important role in delivering a truly innovative and unparalleled experience for the customers of our sportsbook brands. The work will build upon a culture of “data as a product” to significantly extend our proof-of-concept efforts in this area.
We are now looking for a talented Quantitative Analyst to join our team to help shape our sport and racing modelling efforts. This role provides an exciting opportunity to be a pivotal part of the team. On joining, you will be responsible for performing data analysis and building probability and machine learning models to derive descriptive and predictive insight about sporting events. Your work will help to deliver market-leading tools and capabilities to support our commercial and operational functions and decision making processes.
- Rate: Highly Competitive
- Location: London, Wimbledon
- Contact: shanice.Tatter@KindredGroup.com (please mention this list when you get in touch)
- Side reading: link, link
Lead Data Scientist - Kindred Group
Kindred Group use data to build solutions that deliver our customers the best possible gaming experience and we have ambitious plans to get smarter in how we use our data. As part of these plans we’re looking to recruit a Lead Data Scientist to drive our advanced analytics initiatives and build innovative solutions using the latest techniques and technologies.
Key Accountabilities
• To lead, manage and deliver our advanced analytics initiatives using cutting edge techniques and technologies to deliver our customers the best online gaming experience.
• Working in cross functional teams to deliver innovative data driven solutions.
• Able to advise on best practises and keep the company abreast of the latest developments in technologies and techniques
• Building machine learning frameworks to drive personalisation and recommendations.
• Building predictive models to support marketing and KYC initiatives.
• Continually improving solutions through fast test and learn cycles
• Analysing a wide range of data sources to identify new business value
• Be a champion for advanced analytics across the business, educating the business about its capability and helping to identify use cases
- Rate: Competitive
- Location: London, Wimbledon
- Contact: Shanice.Tatter@kindredgroup.com (please mention this list when you get in touch)
- Side reading: link, link
Software Engineer IV, Recommenders, Elsevier
Recommenders is Elsevier’s suite of recommendation systems, which uses Data Science and machine learning techniques to keep researchers appraised of developments in their field, new funding opportunities, finding peer reviewers and papers related to their work. We're looking for a data engineer to help us build the pipelines which extract features from the unparalleled collection of research data flowing through our systems.
You'll be working in a modern technology stack (AWS, Scala, Spark, Kafka, we're currently looking at SageMaker and Kedro) as part of a small cross-functional team. If you're interested in learning more, please contact Stuart White at the email address below.
- Rate:
- Location: London
- Contact: s.white.1@elsevier.com (please mention this list when you get in touch) (please mention this list when you get in touch)
- Side reading: link
Data Scientist (NLP) at Climate Policy Radar, Permanent, London
Climate Policy Radar is a not-for-profit climate AI startup on a mission to map the global policy landscape, harnessing machine learning to create the evidence base for informed decision-making. Our work helps governments, the private sector, researchers and civil society to advance effective climate policies rapidly, replicate successful approaches and avoid failed ones, enhance accountability and promote data democratisation.
We are building the capability to collect and structure climate policy documents from all around the world. Now, at the beginning of this exciting journey, we need an exceptional individual with broad practical experience of ML and NLP to extract information from large and complex unstructured documents. You will need the creativity and passion to write the playbook, and be comfortable working in situations where uncertainty is high, defining the problems as much as the solutions. You will be willing to roll up your sleeves and dive deep into working on a wide range of areas, including the design of data labelling strategies, stakeholder collaboration and model deployment.
- Rate: £50k - 60k depending on experience
- Location: London
- Contact: jobs@climatepolicyradar.org (please mention this list when you get in touch)
- Side reading: link
ML Engineer - Data at Lean Tech Ltd
Lean provides Payment and Data APIs to unlock the financial technology sector and enable financial innovation in the Middle East.
We launched our first products to market at the beginning of 2021 and now support over 90% of the retail banking market in the UAE. With ambitions to build an entire ecosystem for Fintech in the region we're now looking to expand to new regions and support stakeholders from end-users, to Fintechs, regulators and financial institutions.
As we collect more raw data and enable an increasing variety of use cases, our data science products and processes will play an important role in Lean's advancement within the Fintech ecosystem. We are looking for an ML Engineer with a software engineering background and a strong interest in innovative financial applications. Your role will be to extract exciting and scalable features from the river of data that flows through our system.
- Rate:
- Location: Shoreditch, London
- Contact: nadia@leantech.me (please mention this list when you get in touch)
- Side reading: link
Senior & Lead Data Science vacancies - M&S
Here at M&S the data science function builds end-to-end AI and machine learning solutions in retail and e-commerce and helps our colleagues in Food, Clothing & Home, Fashion, Marketing, Loyalty, Supply Chain, Growth, Customer Services etc. driving value from data and create personalised experiences for our customers. We apply state of the art machine learning techniques to solve a variety of problems such as outfit recommendations in fashion, personalised offers for our loyalty program, pricing optimization, demand forecasting for supply chain, product waste management for retail, and AI powered campaigns for our marketing. We are hiring at both Senior and Lead levels. If you would be interested in finding out more, please contact me on the below email address.
- Rate: Competitive
- Location: London / Remote
- Contact: craig.parke@marks-and-spencer.com (please mention this list when you get in touch)
- Side reading: link
Senior Data Scientist at OVO
The OVO Group is a collection of companies with a single vision: to power human progress with clean affordable energy for everyone. The data which we collect is multi-faceted and complex. There is an opportunity to become a truly AI business, with market changing innovation and finely optimised processes leading to zero carbon and low cost for our energy customers.
We are looking for a Senior Data Scientist with hands-on experience building end-to-end data science products in a production setting. Primarily you will work within cross functional, product teams, but might be required to contribute to specific data science initiatives. You will take a lead role in demonstrating the value data science can add to teams across OVO Energy, working with stakeholders to understand their data science needs, owning the delivery of projects from start to finish, and evaluating value post delivery. You will also be expected to coach junior Data Scientists and help to define data science best practices. Technology stack: SQL, Python, GCP (BigQuery, Composer, Cloud Functions, Dataflow, CloudRun), CircleCI for CI, Github for version control.
- Rate: £65k-£85k depending on experience
- Location: London or Bristol
- Contact: heather-grace.blain@ovoenergy.com (please mention this list when you get in touch)
- Side reading: link, link
Consultant/Full-stack Developer
I'm looking for data-focussed software consultants on behalf of Sahaj.ai. Sahaj are a premium consultancy who focus on the intersection of data science, data engineering and platform engineering to solve complex problems for clients across a variety of industries. Their culture is built on trust and transparency, they have open salaries and there is a flat structure with no job titles or grades.
You can expect to work in small 2-5 people teams, working very closely with clients in iteratively developing and evolving solutions. You will play different roles and wear multiple hats, including analysis, solution design and coding.
You will have a passion for data and software engineering, craftsman-like coding prowess and great design and solutioning skills. You will be happy coding across the stack: front, back and DevOps, and have the desire and ability to learn new technologies and adapt to different situations. As a guide, you are likely to have 7 to 20 years+ experience.
- Rate: £60k-£120k dependent on experience
- Location: London (Tottenham Court Rd)
- Contact: davina@makeadifference.digital (please mention this list when you get in touch)
- Side reading: link
Data Scientist / Senior Data Scientist at FreeAgent, Permanent
FreeAgent removes the stress and pain of dealing with business finances, allowing business owners to focus on running their business. Our data science and data platform teams have created a machine learning model to categorise business banking transactions that’s currently applied to over 100,000 customers in production. We have big ambitions to further our use of machine learning and artificial intelligence and you could be a part of that! We primarily work with Python/pandas/scikit-learn and use AWS SageMaker to build and deploy our models and our regular company hack days and wiggle weeks provide a great opportunity for data scientists to pursue their own ideas.
- Rate: £34k - 65k + comprehensive benefits package
- Location: Edinburgh or hybrid/remote depending on experience
- Contact: jobs@freeagent.com (please mention this list when you get in touch)
- Side reading: link, link, link
Data Scientist
We are the fastest growing online travel agent in the UK, and we help people find their dream holiday. In the data science team, we collaborate with people across the company on business problems using programming, statistics, and machine learning.
If you love solving abstract problems, have an outstanding university degree, know SQL and Python, and have experience with machine learning, we want to hear from you!
- Rate: 50000 pa
- Location: Hammersmith or partly remote
- Contact: ben.auffarth@loveholidays.com (please mention this list when you get in touch)
- Side reading: link