DSC Weekly Digest 7 June 2021
Computer languages over the years tend to rise and fall in popularity, depending upon what the job market looks like, what’s the hot technology du jour, and what needs it fulfills. There was a time in the not-so-distant past when Ruby on Rails was the must-have language out there, yet Ruby now seldom cracks the top 20 languages in most people’s surveys. I can even remember a time when LISP was the dominant language in the artificial intelligence space, though you’re more likely today to find LISP only as faint echoes in languages like Erlang and Clojure.
If you look through older articles on DSC you’ll find plenty of fodder about whether R or Python is the better language to learn, though by the numbers Python looks to be eclipsing R finally in the great language religious wars. However, the reality is that in the analytics space, your language choice is becoming less and less relevant.
This is partially due to the fact that most language vendors (and database vendors) are increasingly incorporating analytics libraries and extensions into their respective products and communities, especially in those places where you have the ability to compile these libraries into high-performance code.
For instance, native React (a Javascript-based variant) is increasingly taking on analytics and machine-learning loads that would have been unthinkable even a couple of years ago. The ability to push analytics processing capabilities out to edge devices and various web browsers is also changing the nature of the game, especially as web applications increasingly become serverless.
Similarly, DevOps is mingling analytics code and machine learning models with robotic process automation, using Javascript as the preferred glue. Additionally, you’re seeing more web-based analytics suites, where the need to write any formal algorithms is diminishing rapidly. That you still need to understand what the mathematical tools are supposed to do (and how they should be applied) is still true, but the days where data scientists spent all their time writing R scripts is likely in the rearview mirror.
Consequently, the best answer to languages for working as a data scientist is to learn at least one, whether that be R, Python, Javascript, Scala, Java, Clojure, Haskell, or even LISP (okay, perhaps not LISP, though it is a cool language … Scheme might be better) but more importantly learn how to write code in general. Become proficient with query languages including SQL, SPARQL, GraphQL, XQuery, because that’s where you’re going to spend the bulk of your time, and learn to think declaratively, just because declarative programming is more useful for working with data in general than imperative programming is. Finally, don’t discount Excel or other spreadsheets – a surprising amount of work in the analytics space is STILL done in Excel, and you will almost certainly end up having to work with it in some capacity.
These issues and more are covered in this week’s digest. This is why we run Data Science Central, and why we are expanding its focus to consider the width and breadth of digital transformation in our society. Data Science Central is your community. It is a chance to learn from other practitioners, and a chance to communicate what you know to the data science community overall. I encourage you to submit original articles and to make your name known to the people that are going to be hiring in the coming year. As always let us know what you think.
In media res,
Kurt Cagle
Community Editor,
Data Science Central