What can Data Scientists / Machine Learning Specialists learn about their peers from Stack Overflow’s annual Developer Survey?

Peter Schuld
4 min readJan 27, 2021

--

Introduction

Successfully breaking into the field of Machine Learning or accelerating your career in Data Science depends on your technical skill set. However, deciding what combination of programming languages to choose and what other tools and technologies to pursue is notoriously difficult because of the large number of possible combinations. Choosing with confidence requires visibility of trends not only in the global software industry, but in your relevant peer group. The annual Stack Overflow’s developer survey is the largest survey of software developers in the world, providing us with market trends and with meaningful data of subgroup like Data Scientists / Machine Learning Specialists.

Part I: Additional roles Data Scientist/Machine Learning Specialist perform

Data Scientist/Machine Learning Specialist frequently perform 3 to 5 additional developer roles as well. Almost half are also Backend Developers and 37 % are Fullstack Developers. Other popular roles performed are Data/Business Analyst (33%) and Data Engineer (30%). About a third are Academic Researchers or Scientists.

Part II: Programming languages Data Scientist/Machine Learning Specialist use

Python is by far the most popular programming language for Data Science and Machine Learning. In addition, the relational database language SQL is used by 60% of Data Science/Machine Learning Specialists and half use HTML/CSS and JavaScript for web development. About 40% use Bash/Shell/PowerShell for command line Interface and 28% use the R language for data analysis. Already 4 % use Julia, a language that first appeared in 2012. On average, Data Science/Machine Learning Specialists use 5 programming, scripting, and markup languages.

Part III: Popular Tools of Data Scientists/Machine Learning Specialists

The most popular tools with Data Science/Machine Learning Specialists are Pandas (50 %), Tensorflow (38 %), Keras (27 %), Node.js (26 %) and Torch/PyTorch (20 %).

Let us now turn to broader trends for all developers.

Part IV: Changing popularity of programming languages

Over the last two years, the order of the 5 most popular languages used by all developers has remained unchanged, but they are losing market share. JavaScript (-2.2 %) remains the most popular programming language with developers, followed by HTML/CSS (-2.0%), SQL (-2.3%), Java (-5.1 %) and Bash/Shell/PowerShell (-6.7 %). Languages gaining popularity in the last two years are Python (+5.3 %), TypeScript (+8.0 %), Go (+1.7 %), VBA (+1.2 %), Kotlin (+3.3 %), Rust (+2.7 %) and Julia (+0.4 %).

The 5 most desired languages developers wanted to work with in 2020 are Python (46.2%), JavaScript (45.3%), HTML/CSS (35.9%), SQL (34,6 %) and Java (22.9 %).

Part V: Variables with a strong impact on salaries

Next, we have used the 2020 answers from professional developers in the USA to train a linear machine learning model with Python scikit-learn. We have used numerical variables like work experience (in years), and some of the survey’s categorical variables as input for the model. Our model accounts for 50% of the variation of salaries. We have analysed the impact of developer roles, programming languages and of tools developers use. Professional developers who perform the role of Senior executive/VP have the highest expected salary of all developer types, while Backend Developer have the lowest. The role Data Scientist / Machine Learning Specialist has a moderately strong impact on salary. In contrast, the role of Data / Business Analyst has only a moderate impact on the response variable.

Surprisingly, niche programming languages like Dart or Julia have a strong impact on salary expectations. The two programming languages most popular with Data Scientists / Machine Learning specialists, Python and R, have a moderately strong and a weak impact on expected salary, respectively. Knowledge of command line Interface languages Bash/Shell/PowerShell has a moderate impact and proficiency in SQL has only a weak impact on forecasted salary.

Using the machine learning tools Torch/PyTorch and Keras have a moderately strong impact on expected salaries, while using the tools TensorFlow and Pandas only have a weak impact on the response variable. However, proficiency in the Big Data tools Apache Spark and Hadoop have a moderate impact on expected salaries.

Conclusion

Analysing the annual Stack Overflow Developer Survey is inspired by our need to avoid complacency. Let us use the results to get a better understanding of the different roles Data Scientists / Machine Learning Specialist perform and to decide what technology to pursue. Taking a closer look at Julia or Keras might well be worth the effort.

To see more about this analysis, see the link to my Github available https://github.com/PeterSchuld/StackOverflowDeveloperSurvey .

Data Source

The anonymized results of the survey are available for download under the Open Database License (ODbL). https://insights.stackoverflow.com/survey

--

--

Peter Schuld
0 Followers

Background in Economics and Asset Management. Financial Analyst and Data Science / Maschine Learning enthusiast.