Data Science, Machine Learning And Artificial Intelligence Queries and Answers
How to deploy Machine Learning Model in Production ?
Deploying a machine learning model into product can have various approaches. These would be again based on existing architecture of the legacy system in use. Possibilities are that it's a green field project or brown field project.
Say, you are in telecommunication domain and deploying a model on customer data to identify the Customer lifetime value. So what is this project all about. Even before ML was applied the business was up and running. This means they already have a legacy system which could be a SAP or ORACLE ERP/CRM or a custom-built system. This means their exists a frontend for capturing or keying in data. The data is stored in database for computation and processing. The data is summarized and stored in data mart or data warehouse for querying.
Typical, companies starting first generation machine learning practice would have a shadow server or mirror analytics server. The team would have a work on data extracted from this server. The data engineer or the full fledged Data Scientist would create the preprocessing funnel. This could be done using tools like Informatica or Talend.
The modelling team or data scientist would follow the steps which I am not discussing as the question is about deployment. Once everything is ready, the team would keep the code stack on this analytical server for runs. The DBA would then authorize a CRON job or a touch file using a scheduler to run this whole code stack. The computations of the algorithm would then be passed into the data base at a predefined destination file.
Lets talk about Organization which are typically product companies in second generation of machine learning practice. They might use Docker and make use of rest API's with help of developers to make it a seamless process. This would make it easy to port these applications in new environment or for external clients and customers.
In some Organizations, they would use cloud infrastructure where they would be using DevOps and build the environment. The development, testing and deployment would all happen on the cloud. In some cases the existing API's on cloud might be used rather than creating new ML Models.
Is Python a Programming Language?
Surveys are generally accustomed to establish a point. I'd state that’s how they're misused. In truth, time and again figures have long being utilized to justify a choice and never get a decision. I evaluate PhD thesis, not from an educational perspective but from the Sector viewpoint, so I've found this.
Coming back, in case you request a true programmer, Python is in fact not a breed among the programming language stacks. It doesn’t stand in opposition to C, C++, Java. I do concur Python has simplicity of use. But In addition, it includes its solitary threaded method thanks to lock's each time there's a call to a method. Concurrency is a problem and limitation in Python.
Python is convenient to use, adapt. For instance Tensor Flow used for deep Learning is an implementation in C++. Keras on other hand is a wrapper composed along with it in order that we don’t avoid taking care of memory stacks. Numpys for mathematical computation and Pandas for data frames along with sklearn for machine learning are wonderful packages in python. But if you evaluate python with legacy languages like C, C++ and Java. As compared to the latest stacks like Kotlin, Rust, etc python stands no chance. Primary reason for this is android development and concurrent processing with multi threading.
My see is Python is good for machine learning but not for programming.
Recommended R libraries and packages for data science
The Top useful and most recommended packages or libraries in R would be
sqldf is package used for querying a data frame in R using SQL Syntax
ggplot2 is a good package for visualization which gives the combined advantage of LATTICE and Base R graph
reshape2 for good for resarranginging the data
lubridate is a good package for managing date time stamps
dplyr package developed by Hadley Wickham from RStudio for data frame manipulation
forecast package apart from ts, timeseries is good for longitudinal data
stringr a part of tidyverse is good for text handling, cleaning and preparing textual data
What are some used cases of Machine Learning in Manufacturing ?
Some best used cases of Machine Learning have been in the Assembly line of most Automobile companies. Robotics Intelligence detects a hairline fracture in a sheet metal before it is pressed to a required shape. Even after the sheet metal is processed and given final form the robotics unit will detect any damage using infrared bouncing on the surface.
A lot of data in assembly lines are relayed into machine learning systems from SCADA and real time decision are taken accordingly.
With Industry 4.0, application of Machine learning, Mechatronics and Industrial IOT is mandatory.
Manufacturing domain today use's a lot of vibration data, sound data and image data. Machine learning at real time define new thresholds of wear and tear on spares and equipment's.
A lot of simulation is done these days to anticipate events of failure. Tradeoff between spares replacement vs equipment replacement is done using machine learning models which defy OEM specifications.
Is the 70:20:10 training, testing and validation data set in Machine Learning always applicable ?
Statistical learning models like Linear or Logistic regression use typical train sets of 70 percent of data , testing and validation sets of 20 and 10 percent. This is because there is statistical significance tests involved. Besides these models don't have the innate issues of over gaining from data which is typically seen in ML models like Random Forest, Boosted trees.
Using a Random Forest or Bagging tree with 75% training data makes it more simpler to overlearn. This would lead to a scenario where performance on train data would be exceptional but on other sets it will fail. Aside from this, no model will perform or is best fit for ever. A working model may get superfluous throughout some stretch of time on account of example changes. No vocalist, entertainer or cricketer is consistently the top choice. With changing time the new data, thus, the new generation won't incline toward the oldies. So your model has to be upgraded, either by changing the weights because of more or fresh learning data parsed. Alternatively the features used in the model may also be changed to make it more relevant.
Whats the difference between Blockchain and Machine Learning ?
Technically speaking, there is a lot of usage of Machine learning in block chain. Functionally Blockchain is more about peer to peer distributed ledger system. While Machine Learning is about using algorithms for classification, prediction or detection of events. Blockchain is a distributed frame work of authentication and validation in a peer to peer network. We can see it as if moving to a barter system in economics wherein there is no central authority to validate and certify. Block chain has application in Legal, Revenue, Financial and Health Care data validation and management.At the core is the ciphering and encryption technology which makes it possible to validate a transaction. Unlike RBI which validates the authenticity and guarantee of a specific value to a currency. Bitcoin or Ethereum uses peer to peer network, with data miners system processing on their machines with algorithms. The technology is Blockchain and the implementation is Bitcoin or many like these.
In a traditional Banking system also the money is actual physically never transferred. Its a centralized ledger, wherein the banks, the payment gateway, wallets and other players make entries of deposits and withdrawals. The same happens in Bitcoin but the only difference is there is no central authority like the RBI in India and Fed in US.
So technically it doesn't make any valid point to compare the two.
Difference between Data Analytics and Business Analytics ?
There is realistically no difference between data analytics and business analytics, but yes there is a slight difference. Its more from the perspective of who is doing it and how its perceived by the observer.
Data Analytics is like this image in the mobile, done by some one who is more on the technical side and doesn't have more of exposure to business aspects. As your experience in data science grows you look more like a Business Analytics resource, the one who is been photographed. The father was a son once.
When we apply data analytics techiques on Business data to create value the process is called Business Analytics. Business or domain knowledge is what seperates Business Analytics from Data Analytics.
If we apply data analytics on functions like HR and Operations we refer to them as HR Analytics and Operations Analytics.
Who is a Data Consultant and what is his Job Profile ?
Data Consultant is a very generic term. He or She would be a go-to person for green field or brown field data implementations in any organization. Right from inception to conception success factor would ride on this role's delivery capacity. This is a technical advisory role.
In one assignment the consultant may be working on a company which doesn’t have an IT backbone and is rolling out its implementation. The Data Consultant's role over here would be to advice on the data base architecture. Front end, middle ware and backend selection decisions are also key things he has to consider. Technology and infrastructure, whether its on premise vs cloud implementation, ROI, costing and so on are few of the aspects he would recommend. But he would not be the project manager.
If a company already has an existing source data system, the data consultant may recommend implementation of Business Intelligence tools. Reporting, dash boarding and decision making would depend largely on successful implementation of BI/BO. While doing these recommendations' scalability, security, resilience are key factors which are considered by a data consultant.
As the Organization matures in the Data lifecycle curve, it would want to harvest and mine the data. The Data Consultant would also define and structure the monetization of data at rest as well as data in motion repositories. Typically, databases are where you would have data in motion. These are transactional in nature. Data at rest would typically be data warehouses or data marts. Here too, the data consultant would be an integral part of the infrastructure and capability development. The Data Consultant is an expert in matters related to data, its uses and application. He would be a generalist in terms of data skills.
About the Author:
Mohan Rai
Mohan Rai is an Alumni of IIM Bangalore , he has completed his MBA from University of Pune and Bachelor of Science (Statistics) from University of Pune. He is a Certified Data Scientist by EMC.Mohan is a learner and has been enriching his experience throughout his career by exposing himself to several opportunities in the capacity of an Advisor, Consultant and a Business Owner. He has more than 18 years’ experience in the field of Analytics and has worked as an Analytics SME on domains ranging from IT, Banking, Construction, Real Estate, Automobile, Component Manufacturing and Retail. His functional scope covers areas including Training, Research, Sales, Market Research, Sales Planning, and Market Strategy.
How to deploy Machine Learning Model in Production ?
Deploying a machine learning model into product can have various approaches. These would be again based on existing architecture of the legacy system in use. Possibilities are that it's a green field project or brown field project.
Say, you are in telecommunication domain and deploying a model on customer data to identify the Customer lifetime value. So what is this project all about. Even before ML was applied the business was up and running. This means they already have a legacy system which could be a SAP or ORACLE ERP/CRM or a custom-built system. This means their exists a frontend for capturing or keying in data. The data is stored in database for computation and processing. The data is summarized and stored in data mart or data warehouse for querying.
Typical, companies starting first generation machine learning practice would have a shadow server or mirror analytics server. The team would have a work on data extracted from this server. The data engineer or the full fledged Data Scientist would create the preprocessing funnel. This could be done using tools like Informatica or Talend.
The modelling team or data scientist would follow the steps which I am not discussing as the question is about deployment. Once everything is ready, the team would keep the code stack on this analytical server for runs. The DBA would then authorize a CRON job or a touch file using a scheduler to run this whole code stack. The computations of the algorithm would then be passed into the data base at a predefined destination file.
Lets talk about Organization which are typically product companies in second generation of machine learning practice. They might use Docker and make use of rest API's with help of developers to make it a seamless process. This would make it easy to port these applications in new environment or for external clients and customers.
In some Organizations, they would use cloud infrastructure where they would be using DevOps and build the environment. The development, testing and deployment would all happen on the cloud. In some cases the existing API's on cloud might be used rather than creating new ML Models.
Is Python a Programming Language?
Surveys are generally accustomed to establish a point. I'd state that’s how they're misused. In truth, time and again figures have long being utilized to justify a choice and never get a decision. I evaluate PhD thesis, not from an educational perspective but from the Sector viewpoint, so I've found this.
Coming back, in case you request a true programmer, Python is in fact not a breed among the programming language stacks. It doesn’t stand in opposition to C, C++, Java. I do concur Python has simplicity of use. But In addition, it includes its solitary threaded method thanks to lock's each time there's a call to a method. Concurrency is a problem and limitation in Python.
Python is convenient to use, adapt. For instance Tensor Flow used for deep Learning is an implementation in C++. Keras on other hand is a wrapper composed along with it in order that we don’t avoid taking care of memory stacks. Numpys for mathematical computation and Pandas for data frames along with sklearn for machine learning are wonderful packages in python. But if you evaluate python with legacy languages like C, C++ and Java. As compared to the latest stacks like Kotlin, Rust, etc python stands no chance. Primary reason for this is android development and concurrent processing with multi threading.
My see is Python is good for machine learning but not for programming.
Recommended R libraries and packages for data science
The Top useful and most recommended packages or libraries in R would be
- sqldf is package used for querying a data frame in R using SQL Syntax
- ggplot2 is a good package for visualization which gives the combined advantage of LATTICE and Base R graph
- reshape2 for good for resarranginging the data
- lubridate is a good package for managing date time stamps
- dplyr package developed by Hadley Wickham from RStudio for data frame manipulation
- forecast package apart from ts, timeseries is good for longitudinal data
- stringr a part of tidyverse is good for text handling, cleaning and preparing textual data
What are some used cases of Machine Learning in Manufacturing ?
Some best used cases of Machine Learning have been in the Assembly line of most Automobile companies. Robotics Intelligence detects a hairline fracture in a sheet metal before it is pressed to a required shape. Even after the sheet metal is processed and given final form the robotics unit will detect any damage using infrared bouncing on the surface.
A lot of data in assembly lines are relayed into machine learning systems from SCADA and real time decision are taken accordingly.
With Industry 4.0, application of Machine learning, Mechatronics and Industrial IOT is mandatory.
Manufacturing domain today use's a lot of vibration data, sound data and image data. Machine learning at real time define new thresholds of wear and tear on spares and equipment's.
A lot of simulation is done these days to anticipate events of failure. Tradeoff between spares replacement vs equipment replacement is done using machine learning models which defy OEM specifications.
Is the 70:20:10 training, testing and validation data set in Machine Learning always applicable ?
Statistical learning models like Linear or Logistic regression use typical train sets of 70 percent of data , testing and validation sets of 20 and 10 percent. This is because there is statistical significance tests involved. Besides these models don't have the innate issues of over gaining from data which is typically seen in ML models like Random Forest, Boosted trees.
Using a Random Forest or Bagging tree with 75% training data makes it more simpler to overlearn. This would lead to a scenario where performance on train data would be exceptional but on other sets it will fail. Aside from this, no model will perform or is best fit for ever. A working model may get superfluous throughout some stretch of time on account of example changes. No vocalist, entertainer or cricketer is consistently the top choice. With changing time the new data, thus, the new generation won't incline toward the oldies. So your model has to be upgraded, either by changing the weights because of more or fresh learning data parsed. Alternatively the features used in the model may also be changed to make it more relevant.
Whats the difference between Blockchain and Machine Learning ?
Technically speaking, there is a lot of usage of Machine learning in block chain. Functionally Blockchain is more about peer to peer distributed ledger system. While Machine Learning is about using algorithms for classification, prediction or detection of events. Blockchain is a distributed frame work of authentication and validation in a peer to peer network. We can see it as if moving to a barter system in economics wherein there is no central authority to validate and certify. Block chain has application in Legal, Revenue, Financial and Health Care data validation and management.At the core is the ciphering and encryption technology which makes it possible to validate a transaction. Unlike RBI which validates the authenticity and guarantee of a specific value to a currency. Bitcoin or Ethereum uses peer to peer network, with data miners system processing on their machines with algorithms. The technology is Blockchain and the implementation is Bitcoin or many like these.
In a traditional Banking system also the money is actual physically never transferred. Its a centralized ledger, wherein the banks, the payment gateway, wallets and other players make entries of deposits and withdrawals. The same happens in Bitcoin but the only difference is there is no central authority like the RBI in India and Fed in US.
So technically it doesn't make any valid point to compare the two.
Difference between Data Analytics and Business Analytics ?
There is realistically no difference between data analytics and business analytics, but yes there is a slight difference. Its more from the perspective of who is doing it and how its perceived by the observer.
Data Analytics is like this image in the mobile, done by some one who is more on the technical side and doesn't have more of exposure to business aspects. As your experience in data science grows you look more like a Business Analytics resource, the one who is been photographed. The father was a son once.
When we apply data analytics techiques on Business data to create value the process is called Business Analytics. Business or domain knowledge is what seperates Business Analytics from Data Analytics.
If we apply data analytics on functions like HR and Operations we refer to them as HR Analytics and Operations Analytics.
Who is a Data Consultant and what is his Job Profile ?
Data Consultant is a very generic term. He or She would be a go-to person for green field or brown field data implementations in any organization. Right from inception to conception success factor would ride on this role's delivery capacity. This is a technical advisory role.
In one assignment the consultant may be working on a company which doesn’t have an IT backbone and is rolling out its implementation. The Data Consultant's role over here would be to advice on the data base architecture. Front end, middle ware and backend selection decisions are also key things he has to consider. Technology and infrastructure, whether its on premise vs cloud implementation, ROI, costing and so on are few of the aspects he would recommend. But he would not be the project manager.
If a company already has an existing source data system, the data consultant may recommend implementation of Business Intelligence tools. Reporting, dash boarding and decision making would depend largely on successful implementation of BI/BO. While doing these recommendations' scalability, security, resilience are key factors which are considered by a data consultant.
As the Organization matures in the Data lifecycle curve, it would want to harvest and mine the data. The Data Consultant would also define and structure the monetization of data at rest as well as data in motion repositories. Typically, databases are where you would have data in motion. These are transactional in nature. Data at rest would typically be data warehouses or data marts. Here too, the data consultant would be an integral part of the infrastructure and capability development. The Data Consultant is an expert in matters related to data, its uses and application. He would be a generalist in terms of data skills.
About the Author:
Write A Public Review