Understanding Machine and Deep Learning
The underlying technologies and mathematical concepts used for Machine Learning (ML) are key to understanding how to apply it successfully. The aim of Machine Learning is to store, access and process vast amounts of data using database, network and processor components to generate insights or make decisions based on this data. The practice of Machine Learning is one of trial and error. Teams of Data Scientists, Engineers, Product Managers, Designers and Testers work in collaboration to define objectives, refine processes and create data sets that improve the series of operations preformed by the machine as it ‘learns.’ The intention of this piece is to outline the key aspects of Machine and Deep Learning as they relate to successful implementations of these technologies.
| This segment will cover: | |
|---|---|
| 1 | The application of Machine Learning |
| 2 | Deep Learning as a subset of Machine Learning |
| 3 | Designing silicon brains and systems |
The application of Machine Learning
There is a clear distinction between the whole of AI and the parts of AI that are considered to be Machine Learning. In general, AI becomes ML when specific data processing techniques are used to generate insights and take decisions based on, usually very large, data sets. Now there are some ML techniques like Online Learning that stretch this definition a bit. This technique has been designed to update a prediction algorithm based off each new, unique data entry and the data entry does not have to be saved after the update to minimize overall data storage. The data may not be saved after the update is complete and in fact, there are more techniques in ML that try to minimize data storage requirements as it can have large cost implications for the business and performance impacts for the system.
An example – given Xn, what is Y?
The core of all ML techniques is the use of prediction algorithms that generate expected results from a series of input parameters. A straightforward example would be if we gave a series of observed characteristics about a person, like their age and martial status, and asked a machine to tell us whether or not they were a student. We might want to have this information to complete partially filled census forms or to do a historical analysis of students in a given region. Given that we want a yes-or-no style response from the machine to the question ‘are they a student?’ – we would likely apply one specific ML technique as we go through the overall ML process. The technique uses a Logistic Algorithm, a non-linear mathematical function best suited for Classification problems, to identify the probability of whether or not a person in the dataset is a student or not.
As this technique is employed, the team working on this problem will need to have a initial dataset that has the student status information of all participants included so that the machine can be trained to predict correctly when this information is not provided. The error rate of the prediction algorithm is quantified in a function called the ‘cost function’ that is minimized over time with many repetitions that use different versions of the prediction algorithm, reduced data sets or expanded data sets. When the cost function meets with some parameter for success, say we need the equation to be correct 95% of the time, then the prediction algorithm is considered to be successfully trained and ready for use.
Different types of prediction error
In order to train prediction algorithms correctly, teams need data sets that have captured a sufficient level of detail for each event and a sufficient number of events. This usually requires a combination of subject matter experts who are able to identify what details are likely important in decision-making and data science experts who can take fresh eyes to any industry or situation to discover new correlations or variables previously overlooked. These datasets used for Machine Learning are divided into three groups – the training, the cross-validation and test data sets. This division is there to basically avoid the large two discrete problems can occur when creating prediction algorithms visualized below.

One of the reasons why incumbent companies have such an advantage in ML over potential new entrants is that vast data warehouses that they have been able to build up over time. For those interested Google users, please go to https://myaccount.google.com/data-and-personalization to see the data events that Google tracks as it relates to their search and advertising business. With this amount of data available and the advanced algorithms Google has in place, it is highly unlikely that any search provider will ever be able to deliver the same level of service as Google, unless they can take advantage of some new technology that leapfrogs the current ML systems that Google uses. The same data advantage applies to state-supported businesses operating in China where data privacy and ownership norms are looser than in other AI markets like the EU or USA.
Output goals and key ML techniques
The key focus of intelligent machines is that we apply them to tasks where they are significantly better than humans and to tasks that make us better at achieving some larger goal. There are a wide array of tasks that fall into these buckets. To best design systems and the maths needed to solve these tasks, we have developed many different models that have varying degrees of use in different situations. A good metaphor might be to think of Machine Learning as the field of martial arts. Within Machine Learning, there are different techniques that are suitable for different environments and modes of combat. If a team is designing a military drone program, they might use Krav Maga or Brazilian Jiu Jitsu while a team building a financial fraud detection platform might use Tai Chi. In general, there is a sense on which technique or model is best suited for which situation but there definitely seems to be some freedom for operators to pick their preference.
There are three broad categories of Machine Learning that group together the different techniques based on how the dataset is fed to the machine and what the desired output is. Data processing and statistical inference discoveries are also brought to bear in many different aspects of ML.
Supervised Learning – here’s a dataset, given me an answer
The first type is Supervised Learning where the machine is given a well-labelled, pre-processed dataset and is expected to generate a prediction or action based off the given input. A great real life example of this is a spam filter – we all have this feature built into our email inbox and it readily uses all the available information in an email to detect whether or not an incoming message is spam or not. There are two groups of techniques in this type of ML – regression where the algorithm looks to predict a specific output value, like future prices, or classification where the machine aims to classify data entries into probable groups based on pattern recognition, like spam or not spam.
Unsupervised Learning – here’s a dataset, tell me what you see
The second type is Unsupervised Learning where the entry datasets are less organized than in supervised learning, and the expected output is undefined. We use Unsupervised Learning when we want a machine to look at our data and tell us what patterns it sees that we should consider. Clustering is a technique used within Unsupervised Learning where groups of patterns are matched together and is often used to analyze consumer shopping behaviours to design bundles or sales recommendations. If you’ve ever seen “shoppers also look at” or “inspired by your browsing history” while you’re shopping online, you’ve come across an application of Unsupervised Learning in real life.
Reinforcement Learning – go make your own dataset and be good at something
The third big type of ML is Reinforcement Learning – if Supervised Learning is teaching a machine and Unsupervised Learning is the machine studying on its own, Reinforcement Learning would be the machine learning by doing. This aspect of ML focuses on instances where a machine is able to generate its dataset through repeated attempts to understand its environment and how to successfully navigate it. Given the risks involved, this method of discovery is more appropriate in a virtual environment like when DeepMind designed a pro eSports AI1 rather than allowing a self-driving car to learn by crashing.
Semi-Supervised Learning is very similar to Reinforcement Learning in that the machine is expected to complete or identify unlabelled parts of a dataset, but is different in how the algorithm gets its guidance. In Reinforcement Learning, there will be a general parameter or set of parameters for success whereas in Semi-Supervised Learning, the machine will have an incomplete data set to guide its definition of success for its task.
Deep Learning, a class of ML
Whenever Artificial Neural Networks (ANNs) are involved in Machine Learning, that’s when we ventured out into Deep Learning. Neural Networks involve abstractions of data where the final deciding algorithm does not make decisions directly off the data, but instead after layers algorithms have already processed the data. An example might be when we decide to take a step back from a camp fire, because our ears heard the fire crackle, our eyes saw the flame, our nose was bothered by the smoke and our hand felt the heat. The final decision was to move our hand because it was too close to the fire and our brain processed a variety of inputs to intuit a response – this is a fair analogy for a natural neural network. The reason for the biological example is that Artificial Neural Networks are based, in concept, of our best understanding of how the neurons that constitute our brain process information.
There are many different types of ANNs and these different types have emerged as ANNs are applied to the three different types of ML mentioned above.
Designing silicon brains and systems
Now, for most applications, whether it’s for web or mobile, the engineering and technology teams behind the tools have to make a series of complicated decisions on how to design their systems. The considerations in mind are always around the performance, security, cost and scalability of the user experience using the application that is developed. The engineering teams behind the product will have to balance the different aspects mentioned in order to achieve the overall objective and it varies by use case or product. Security and cost are usually correlated so that when one goes up, so does the other – for instance, having higher security in medicine, mobility or finance warrants the additional cost of security and encryption while viewing online, digital media may not. Performance and security are usually negatively correlated. The list goes on and the important take away is that for digital or software products to happen, somewhere there has to be the hardware components that process, store and manage the information involved.
In the age of cloud computing, we have created enormous efficiencies in technology infrastructure by centralizing our system hardware with a few specific providers who have proven to be exceptional at it. The reason why product managers and investors have to be aware of this is that differences in capabilities and cost between hardware system providers can have tremendous impacts on the bottom line. Additionally, the performance requirements of advanced ML systems can be so intensive that the engineers will actually split processing functions and algorithms between different components of the system to improve performance.
This practice has been happening for some time – where software engineers will perform complicated calculations or high-intensity processing functions in the ‘back-end’ system components that have higher processing output rather than our ‘front-end’ devices that have to share capacity with other applications running at the same time on the same device. The new challenge is splitting out functions in the back-end such that the ML algorithms preform their calculations in as well-designed and spread out fashion as possible to maximize performance and minimize system cost. If we all need to have a supercomputer to keep pace with new AI systems, there will not be many players who can afford such an entry but teams with intelligent system design engineers can devise their software in a way that allows a network of computers to run in synchronicity to achieve the same result. Indeed, an area of tremendous opportunity is improving on the current hardware offered by cloud and hardware providers to enable more ML systems.
Learning from human brains
It’s in our own minds that the idea of Machine Learning becomes most fascinating, after-all, the best example that we have of a thinking device is our human minds. These grey matter devices have consistently proven to be capable of adapting to new environments, discovering the secrets of the universe and intuiting appropriate courses of action for a wide variety of unpredicted environments. The natural process of evolution that took place in our physiology and psychology is now what we seek to replicate with machines. There will be differences that arise due to the large difference of constituent elements in each – one is a biological computer and the other is silicon-based. It is not hard to imagine a future where the distinction between the two falls away as we potentially become more cybernetic to enhance our human capabilities and our ability to engineer biological life continues to improve. These are areas of advancement being explored by businesses like Neuralink and projects like the Human Genome Project.

The human brain is a complex system that has many different processing centres that are activated on an as-needed basis. There is an urban legend floating around that people only use about 10% of their brain at any given time and while this, in essence, might be true, it is only so because our brain has organized its neurons to govern specific functions. It has mapped its system requirements such that the overall system will not reach its processing or bandwidth limitations at any given time because only specific centres will be active at any time. Human potential and creativity is as yet unexhausted. We’re still creating new concepts and exploring ideas in new contexts – as our environment continues to change, so do we. For Machine Learning to reach its zenith, it will most undoubtably learn from the human experience, and for human and machine intelligences to blend together, we humans will have to continue to find new frontiers of our potential to push.
The key takeaway from this segment is the search to discover new neural connections to complete existing tasks more efficiently or discover new frontiers to improve. These new neural paths, whether in silicon or grey matter, are difficult to create and often require experience or expertise to understand how to apply improvements to a situation. These frontiers represent the challenge in creating new neural paths and working in a more connected fashion with machines – we have to be creative in discovering these new territories or reimagining existing ones while knowing how best to work with the current capabilities of our machine counterparts.
