About
If you switch from Web Development to Data Science, it automatically means you have to become a Python programmer.
That's good (I know for this I won't get likes from Rubyists), but Ruby is also trying:
Ruby is (for now) not a Data Science-centric language with a very large established library. We don't have an ecosystem like SciPy in Python. I don't see any serious reasons for this, except historical ones. This is my pain.
Lol. Why do you even try to use Ruby for Data Science tasks?
The reason is very simple. Because I love using Ruby and love its syntax. So I suppose that in some simple cases Ruby is not a bad instrument.
The Good
What was made:
- Some time ago, the Practical AI organization released a series of articles about AI and Machine learning in Ruby, but the posts were very naive with "Hello World" level tasks, and the project ended its life in October 2017.
- There are some things like tensorflow.rb, gems based on PyCall, divorced libraries, or articles about very narrow solutions like captcha solving, image recognition, etc.
They are capable of solving single, isolated tasks without any means to integrate into something bigger.
But all of them just solve one task and then die when it comes to integration with another library. There's no interface to rule them all.
The Ugly
Ruby + Daru + Numo-Narray != Python + NumPy + SciPy + Pandas
One day I decided to play with data and solve this kaggle task without using Python or R languages. Only Ruby — only hardcore.
Quite a stupid idea, but it evolved into my first PR in Ruby/CSV and more questions about the interface that connects all Ruby gems. So the experience was useful.
I found an easy tutorial using Python and started to reproduce it step-by-step in Ruby. Each step in Ruby was definitely a failure compared to Python. Much worse and much slower to perform. In cases where Python needs 1 LOC, Ruby needs 3-4 lines.
Here's an interesting link comparing Python and Ruby for Data refinement to prove my words with real code.
The Bad
Framework for everything.
As I mentioned above, the critical issue is the broken and unrelated structure of DS gems.
It is how proper Ruby gems work: they solve one task, and solve it well (or die trying). Our community is not building some "framework for everything".
© Ruby Community
Our problem is an unreasonable fear of dependency hell or realization overhead, I don't know exactly.
Python solved this problem easily. SciPy is a bunch of little compatible libraries, but they're stitched together by one common interface — NumPy.
Right way: We just need to build a meta-gem that combines everything and installs everything, some kind of interface that bundles them. That's the solution that makes things simpler in my view of this world :)
Current way: If someone needs only a part of a library and sees the whole bunch of other implementations as overhead, they can extract the code and build a targeted gem.
What can we do?
All the pain is that no one uses Ruby for mathematics.
In my opinion, the interface for NumPy in this plan is the most successful. In Ruby, we just need to re-implement the same interface. For instance, pulling the matrix out of distribution involves tons of other code, and this should be done as quickly as possible.
Okay, even if we won't make the super newfangled framework for Data Science, it would be great if each of the future commentators translates one method from sklearn into Ruby — this world will become much better.
Just for comparison
(1) Python today:
import numpy as np
np.random.normal(mu, sigma, size)
(1) Ruby today:
require 'daru'
require 'distribution'
(1) Ruby tomorrow:
require 'daru'
Daru::Vector.new(size, normal(mu, sigma)) #your syntax can be here
P.S.
I have already reconciled and have long been writing in Python and even found many advantages in it.
this bad boy can fit so many behaviors in it
Sooner or later you have to get hooked on the machine learning needle, so you can start right now: https://www.youtube.com/watch?v=T1nFQ49TyeA
Or if you're already involved in DS or just starting your path, try going through this blitz about deep learning: http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html