This is my new blog about Data Science. I hope you’ll enjoy reading it. If you find a mistake, if you have any request from me or you want to make a suggestion, feel free to comment here or contact me directly via email or linkedin.
Develop Spark code with Jupyter notebook - In-code comments are not always sufficient if you want to maintain a good documentation of your code. Sometimes, it is the case when you would like to add equations, images, complex text formats and more. Of course, you can generate a “wiki” page for your project, but what would really be cool is if you could embed […]
Why K-Means is not always a good idea - One of the most basic building blocks in Data Mining is the clustering problem – given a set of untagged (hence, by the way, it is considered an unsupervised) observations, the goal is to group them in such a way that observations of the same group (a.k.a cluster) are more similar to each other than to those in other […]
Leverage Wikipedia to Build Smarter Applications - Intro Data science world is full of interesting methods and algorithms to extract hidden insights from (often) rebellious data sources. Due to the value it brings to a massive amount of companies, it has not only became immensely popular in practice, but also well studied and evolving, theoretically-wise. While having a great amount of respect for getting acquaintance […]
Optimize Linear Algebra Calculations with Native Libraries - Intro Building large-scale machine-learning systems often involves a massive execution of linear algebra calculation under the hood. Whether you use Spark, R, or even plain old MapReduce code written in Java, you might end up doing some operation on a big matrix/vector. And those operations can be done 5x-7x faster! Contents Intro BLAS and LAPACK […]
How do you build a “People who bought this also bought that”-style recommendation engine - Collaborative Filtering Collaborative Filtering (CF) is a method of making automatic predictions about the interests of a user by learning its preferences (or taste) based on information of his engagements with a set of available items, along with other users’ engagements with the same set of items. in other words, CF assumes that, if a […]