Machine Learning in the Big Data Era

The use of information plays a central role in the development of mankind, from the linguistic revolution some 70,000 years ago, through the agricultural revolution and the appearance of writing, the formation of the global economy and politics, to the Big Data era in which we live, and the rise of ” //www.haaretz.com/magazine/the-edge/1.1934116″> “Information Religion “.

In the 1960s, with the advent of the emergence of DBMS, users began to use the digital information at their disposal to get answers to relatively simple questions and make business decisions. The 1980s and 1990s presented two key milestones – RDBMS and OLAP, tools that paved the way for the business intelligence world. As a result, data inquiries have evolved significantly, enabling users to explore sophisticated and more complex information.

Suffering with the subject has never been the strong side of the human race :). And so over the years, in small evolutionary steps, step by step, we entered the era of Google, and we were exposed to a new and fascinating field – machine learning. The following article is intended to provide a basic background in this field, and to explain key concepts.


What is Machine Learning

If using Business Intelligence You can get answers to specific questions, machine learning allows you to gain insight into topics we did not necessarily try to find. With the massive amounts of data companies face today, it is easy to miss key points. Machine learning is a tool that allows you to filter all the “noise” from the data, and through it you can extract non-explicit information, discover hidden connections, understand patterns and trends, and even predict new patterns.


For example, using business intelligence, Supermarkets can analyze the acquisitions of the past year and ask: In the Dan region, for the 30-40 age group, what was the most profitable product, and which operation promoted it most effectively?

By learning a machine, the network can discover patterns that it does not necessarily know, for example, which products are most likely to sell together? On the basis of past sales, what are the sales forecast for the next quarter? Based on a client’s past purchases, what can be inferred from his consumer behavior? What products can you expect to buy the next time? All of these insights empower the company and provide it with a competitive advantage.


Machine learning algorithms

So how do you discover new patterns and patterns? Machine learning is based on algorithms. A machine learning algorithm is a set of predefined calculations that generate patterns. To create the same pattern, the algorithm analyzes the data provided to it, and looks for specific types of patterns, cycles, or new trends. Each algorithm allows for a different form of analysis and is suitable for different purposes

The algorithms can be divided into two main groups. One is algorithms from the Supervised Learning group to produce forecasts, and the second is Unsuccessful Learning, designed to understand the relationships between data and group them into groups. We will briefly review a number of key algorithms.


Decision Trees

The Decision Tree algorithm, from the supervised learning family, examines the effect of different properties on a particular target variable. Through which attributes can be classified into a defined number of groups, which together form a path for decision-making.

A beauty care company wants to embark on an advertising campaign in which it will appeal to its existing customers and offer them a free trial kit in its new products. Since the cost of each kit is expensive, the company wants to contact the customers most likely to purchase the full kit (for a fee).

In the past, the company used a similar campaign, and among certain customers there was a significant success. The company wants to identify the characteristics of those customers and turn to them.


desicion tree




The Company decided to use the following predictors to segment its customers: age, gender and residential (all of its customers live in Jerusalem or Tel Aviv). Based on these characteristics, she found that women in the 30-39 age group who live in Tel Aviv and women in the 40-49 age bracket who live in Jerusalem are the customers who in the past responded most positively to this form of advertising. Relying on these figures, the company will know to turn only to potential customers.



The Clustering algorithm, from the unsupervised learning family, is designed to locate clusters of information with the same characteristics. Exceptions can be identified by clustering and are often used in Fraud Detection processes.

A credit-granting company that manages a large number of customers wishes to ensure that there is a correlation between the level of income of the applicant and the ceiling of the loan that he requests. In addition, the company wants to avoid fraud, for example, situations where people claim that their salary is higher than it actually is, in order to get a larger loan or better terms.
According to its current data, the company has built groups of workers by type of jobs, and has mapped them on a graph with two axes – salary and position. The graph below shows groups of people from various technical support jobs. For example, the red “cloud” represents team managers and generally outlines their pay range.


If a person who identifies as a technical support claims that he or she earns an above-average amount on his or her domain, the company can detect fraud.


If a person who identifies as a technical support claims that he or she earns an above-average amount on his or her domain, the company can detect fraud.


Association Rules

The Association Rules algorithm, from the unsupervised learning family, enables the discovery of relationships between data. Different variations of this algorithm can be found in many places, from IMDB and Netflix, which are able to offer movies based on previous content that the customer was interested in, through Amazon, which offers products based on previous acquisitions, שגילתה A teenage girl is pregnant before her father learns about it :).


Association Rules


Time Series Models

A series of data can be found over time in many industries. The most striking characteristic of these data is the positive correlation between adjacent observations. For example, if today’s sales were relatively high, chances are that tomorrow will be higher than usual, and vice versa.

There are models / algorithms from the supervised learning family that can take advantage of the positive correlation between the observations over time and are able to predict future trends.

For example, based on past fluctuations, what would be the value of a particular stock in 5 minutes? What will be the value of an apartment on Ibn Gvirol Street in Tel Aviv next year? zillow.com For example, using Time Series To predict housing prices in the US. What will the server do in 10 minutes, if the current trend continues? A hospital can ask – based on past data – how many beds are due to be taken this Friday? Companies will be able to ask – based on revenue and expenditure data for the past six months – to what amount should we set our budget for the next quarter?


time series


To summarize

Machine learning is an information analysis technology that enables predicting or grouping data. Using different algorithms we can uncover new patterns, trends, and patterns.

Machine learning is not a process carried out by one entity in the organization, but rather a process that involves many stages – from business understanding, identifying relevant information, filtering and noise cleaning, defining the algorithm, and evaluating output results.

UpScale Analytics is one of the largest platforms in the world for learning SQL by doing, consisting over 300 SQL exercises at different levels (including solutions), by topics, across over 100 different datasets. More…