What challenges will AI technology's data science face?

The development of AI technology has always been the focus of everyone's attention. Data science and machine learning are even more difficult. As we enter the second half of 2017, we can see that data science and machine learning companies face common challenges. Suppose your company is already collecting data on a large scale, using analytical tools, and you have realized that data science can play a major role (including improving decision making or business operations, increasing revenue, etc.) and prioritizing them. Collecting data and identifying issues of interest is not trivial, but assuming you have already made a good start on these areas, what are the remaining challenges?

Data science is a broad topic, so I have to explain: This article focuses on the use of supervised machine learning.

Everything starts with (training) data

Suppose you have a team that handles data intake and integration, and a team that maintains a data platform ("truth source"), new sources of data are constantly emerging, and domain experts are responsible for identifying these data sources. Moreover, since we mainly focus on supervisory learning, the lack of training data is still the primary bottleneck for machine learning projects, which is not surprising.

There are some good research projects and tools for quickly creating huge training data sets (or strengthening existing training data sets). Researchers at Stanford University have demonstrated that weak monitoring and data programming can be used to train models without having to use a large amount of manually labeled training data. The preliminary study of the generative model by deep learning researchers has yielded gratifying results in computer vision and other fields without supervisory learning.

â€œThinking about features rather than algorithmsâ€ is another useful way to evaluate data in the context of machine learning. Friendly Tip: Data expansion may improve your existing model and, in some cases, even help alleviate cold start issues. Most data scientists may have used open source data or through third-party data providers to augment their existing data sets, but I have found that data expansion is sometimes overlooked. People feel that acquiring external data, normalizing it, and experimenting with it is not as attractive as developing models and algorithms.

From prototype to product

Making data science projects product-oriented is the goal of many use cases. In order to make this process more efficient, a new job role has recently emerged - machine learning engineers. There is also a new set of tools for advancing the transition from prototype to product, helping to track the background and metadata associated with analyzing products.

The application of machine learning in products is still in its early stages, and best practices are just beginning to emerge. With the popularity of advanced analytics models, there are several points to consider, including:

Deployment environment: You may need to integrate with an existing log or A/B test infrastructure. In addition to deploying stable, high-performance models to servers, deployment environments are increasingly including how and when to deploy models to the edge side (mobile devices are a common example). New tools and strategies for deploying models to edge devices have emerged.

Scale, Latency, Freshness: How much data is needed to train the model model. The response time derived from the model should be how much the retraining model and the data set should be updated. The latter indicates that you have a repeatable data pipeline.

Deviation: If your training data is not representative, then you will get unsatisfactory (or even unfair) results. In some cases, you may be able to adjust the data set accordingly by using propensity scores or other methods.

Monitoring model: I think people underestimate the importance of monitoring models. In this respect, people who have studied statistics have a competitive advantage. It can be tricky to know when the model degrades and how much it degrades. Concept drift may be a factor. In the case of a classifier, one strategy is to compare the class distribution predicted by the model with the observed distribution of the prediction category. You can also set a business goal that is different from the machine learning model evaluation indicators. For example, the task of a recommendation system might be to help find "hidden or long tail" content.

Key applications: Models deployed in critical environments must be more stable than regular consumer applications. In addition, machine learning applications in such environments must be able to run "continuously" for months (no memory leaks, etc.).

Privacy and security: In general, if you can make users and businesses believe that their data is safe, they may be more willing to share data. As mentioned above, data that is augmented with additional features tends to yield better results. An urgent issue for companies doing business in the European Union is that the General Data Protection Regulations (GDPR) will come into force in May 2018. In other areas, practical research on confrontational machine learning and secure machine learning, including the ability to process encrypted data, has emerged.

Model development

There are more and more reports on model and algorithm development in the media, but if you talk to data scientists, most of them will tell you that the lack of training data and the productization of data science are more pressing issues. In general, there are enough simple and straightforward use cases on the market that allow you to develop your favorite (basic or advanced) algorithms and adjust or replace them later.

Since the tools make the application of the algorithm easy, it is necessary to recall the results of the machine learning model. Still, don't overlook your business metrics and goals, as they don't necessarily match the best-tuned or best-performing models. Focusing on the progress of things related to fairness and transparency, researchers and companies are beginning to examine and resolve issues in this area. Concerns about privacy, coupled with the proliferation of devices, have spawned technologies that do not rely on centralized data sets.

Deep learning is gradually becoming an algorithm that data scientists must understand. Deep learning, originally used for computer vision and speech recognition, is now beginning to involve various data types and problems that data scientists can think of. Among the challenges are choosing the appropriate network structure (structural engineering is a new feature engineering), hyperparameter adjustment, and describing problems and transforming data to suit deep learning. (Coincidentally, one of the most interesting large data products I've seen this year is not based on deep learning.)

Many times, users prefer an interpretable model (in some cases, the black box model is not accepted). The interpretable model is also easier to improve, considering that the basic mechanisms are easy to understand. With the rise of deep learning, companies are beginning to use tools that explain the principles of model prediction, as well as tools that explain where the model comes from (by tracking learning algorithms and training data).

tool

I don't want to list a list of tools because there are so many tools to list. Tools that help us ingest, integrate, process, prepare, and store data and deploy models are all important. Here are a few thoughts on machine learning tools:

Python and R are the most popular machine learning programming languages. For those who want to use deep learning technology, Keras is the most popular entry-level language.

While laptops seem to be good model development tools, the integrated development environment (IDE) is very popular among R users.

There are many libraries for general machine learning and deep learning, some of which are better at advancing the transition from prototype to product.

Advancing the expansion from stand-alone to clustering is an important consideration. In this regard, Apache Spark is a widely used implementation framework. After a series of data collation, your data set is often suitable for deployment to a stable single server.

Vendors are beginning to support collaboration and version control.

Finally, you may need data science tools to seamlessly integrate existing ecosystems and data platforms.

If companies want to evaluate which problems and which use cases are suitable for machine learning, it is a good time. I have summarized some recent trends and unresolved bottlenecks. The main conclusion you can draw from this is that you can start using machine learning now. Start with a problem that already has some data, and then build a great model.

Surface Mount

Surface Mount,Adjustable Surface Mounted Socket,Surface Mount Socket,Surface Mount Dc Socket

Dongguan Swan Electronic Technology Co., Ltd , https://www.swanconnector.com