Search for:
interpreting image classification models via crowdsourcing
Interpreting Image Classification Models via Crowdsourcing

Unsplash image

Students at Delft University of Technology, the Netherlands carried out a crowdsourcing study as part of the Crowd Computing Course designed by Asst. Prof. Ujwal Gadiraju and Prof. Alessandro Bozzon around one key challenge – the creation and consumption of (high quality) data. Course participants presented several brilliant group projects at the Crowd Computing Showcase event held on 06.07.2021. The group consisting of Xinyue Chen, Dina Chen, Siwei Wang, Ye Yuan, and Meng Zheng was judged to be among the best. The details pertaining to this study are described below.

Background

Saliency maps are an important aspect of Computer Vision and Machine Learning. Annotating saliency maps, like all data labeling, can be done in a variety of ways; in this case, crowdsourcing was used since it is considered to be one of the fastest methods. The goal was to obtain annotated maps that could be used to acquire a valid explanation for model classifications. Four task designs were used in the experiment.

Method

Preparation

As a first step, an ImageNet-pretrained Inception V3 model was used to extract saliency maps from original images. The maps were subsequently fine-tuned using CornellLab’s NAbirds Dataset that contains over 500 images of bird species. 11 of those were selected for the project. SmoothGrad was used to minimize noise levels.

Fig. 1 Example image of a saliency map

Experimental Design

Four types of tasks were used in the course of the experiment: one control task that became the baseline and three experimental tasks. Those three were: training, easy tagging (ET), and training + ET. Each task consisted of 74 images that took approximately three minutes to process. Each saliency map was annotated by three different crowd workers.

Task: Baseline

Three functional requirements had to be met in this part of the experiment:

  1. Instruction – the crowd performers’ understanding of the instructions.
  2. Region selection – the performers’ ability to correctly use the interface tools to mark highlighted areas.
  3. Text boxes – the performers’ ability to use the input boxes appropriately to enter relevant information.

Fig. 2 Baseline interface

Task: Training

The performers were asked to complete a set of training tasks that were designed using Toloka, a crowdsourcing platform. A training pool with three 3-minute tasks was created. The performers had to finish all of the tasks with a  minimum accuracy of 70% in order to proceed to the experimental tasks. After this was achieved, the main study began.

Task: Easy Tagging (ET)

As part of the experimental task, the crowd workers had to recognize and label various body parts of bird species. To do that, a picture was provided as a reference. Since the study group’s pilot study demonstrated that color had remained among the most common characteristics, color checkboxes were provided to make color attribute annotations easier for the subjects. In addition, all input boxes contained both “suggestion” and “free input” options, such as when the performers wished to annotate non-color attributes, or the colors provided in the answer box did not match the colors displayed in the image.

Fig. 3 Easy Tagging Interface

Quality Control

Quality control mechanisms were consistent across all four tasks. The performers were asked to use only desktops or laptops during the study to make sure that labeling objects with the bounding boxes was easy and done in the same way throughout. In addition, all of the subjects were required to have secondary education and be proficient in English. Captcha and fast response filtering were used to filter out dishonest workers. The answers were checked manually and accepted based on the following criteria:

  1. At least one bounding box was present.
  2. At least one pair of entity-attribute descriptions was present.
  3. The indices of the bounding boxes had to correspond to the indices of the offered descriptions.

Evaluation Metrics

  1. IOU Score

Intersect Over Union was used to evaluate the accuracy of the bounding boxes. It is calculated by dividing the intersect area of two bounding boxes by the area of the union. The final IOU score is a composite average of multiple IOU values.

  1. Vocabulary Diversity

This metric consists of two values: entity diversity (number of distinct words), and attribute diversity (number of adjectives used to describe one entity).

  1. Completeness

This metric pertains to how complete an annotated saliency map is. It is calculated by dividing the value of the annotated saliency patches by the value of the ground truth annotations.

  1. Description Accuracy

This metric represents a percentage of valid entity-attribute descriptions. The value is calculated by aggregating and averaging the results from three different crowd workers.

  1. Accept rate

This metric is calculated by dividing the number of accepted annotations by the total number of submissions.

  1. Average Completion Time

This metric reflects average duration values of the annotation tasks.

  1. Number of Participants

This metric pertains to the total number of distinct crowd workers participating in the experiment.

Results

  1. The average completion time for all tasks was 3 minutes as predicted.
  2. The mean IOU score was lower in tasks 3 and 4 compared to 1 and 2. This is likely to be the result of the interface differences since the bounding boxes in tasks 3 and 4 contained only one color.
  3. The difference between the mean IOU scores of tasks 1 and 2 is statistically significant (p=0.002) and is in favor of task 2. The difference between the IOU scores of tasks 3 and 4 is not statistically significant (p=0.151).
  4. Training significantly increased completeness (p=0.001). Likewise, easy tagging also raised completeness levels from the baseline values.
  5. No statistically significant difference in entity diversity was observed between tasks 1 and 2 (p=0.829) and tasks 3 and 4 (p=0.439). This was expected since vocabulary diversity was not specifically covered in the training phase.
  6. Training showed to significantly improve description accuracy compared to the baseline values (p=0.001).
  7. Accuracy was increased significantly as a result of the easy tagging interface (p=0.000).
  8. From the within-interface perspective, the difference in attribute diversity of tasks 1 and 2 was statistically significant and in favor of task 1 (p=0.035), which implies that training tends to diminish baseline diversity. No statistically significant differences were observed between the attribute diversities of tasks 3 and 4 (p=0.653).
  9. From the between-interface perspective, a statistically significant difference was observed between tasks 2 and 4 that had different interfaces (p=0.043). This implies that training and interface design are interdependent.

Discussion

Two conclusions can be drawn from this study. One is that performance values depend on what type of interface is being used. In this respect, shortcuts can both help and hinder by either lifting some of the performer’s cognitive load or backfiring and making the performer too relaxed and unfocused. The second conclusion is that training can increase bounding box and description accuracy; however, it can also take away from the subject’s creativity. As a result, requesters have to consider this trade-off before making a decision regarding task design.

Certain limitations of the study should also be taken into account. The most obvious one is that this study should have ideally been conducted as a between-group experiment. Unfortunately, this was not possible. The second limitation is a small number of participants in those tasks that required training. The values received thereafter are likely to be skewed as a result. The last major limitation has to do with applicability – since only aggregated averages from across multiple granularities were used as the final values, these figures are not likely to accurately represent most non-experimental settings.

Since one of the findings suggests that input shortcuts can both increase accuracy and concurrently diminish creativity, future studies should look at different study designs with multiple shortcuts (e.g. shape and pattern). In this scenario, the negative side effect of decreased creativity and boredom may be countered with the more sophisticated interfaces that are practical and user-friendly. Finally, the authors propose a switch from written to video instructions as these will likely be more effective and result in a greater number of subjects finishing the training phase.

Project in a nutshell

Saliency maps are an integral part of ML’s advance towards improved Computer Vision. On par with other forms of data labeling, annotating saliency maps is at the core of training models and their classification. Using crowd workers from Toloka and a dataset of birds from CornellLab’s NABirds, this paper examined how crowdsourcing can be used in saliency map annotations. To do so, four types of tasks were used, of which one became the baseline, and the other three—training, easy tagging (ET), and training/ET—were the main tasks. All of the crowd performers were recruited from the Toloka crowdsourcing platform. Several metrics were used for evaluation, including IOU score, vocabulary diversity, completeness, accuracy, accept rate, and completion time among others. Results showed that the choice of interface had a major effect on performance. In addition, training increased the bounding box as well as description accuracy but also diminished the subjects’ creativity. Implications of these findings and suggestions for future studies are discussed.

Source Prolead brokers usa

MLM Leads
Elevate your Employee Recognition Program with HR Tech

 

Recognition of employees has long been a cornerstone of good administration. However, as the battle for talent heats up, how businesses appreciate their employees is more crucial than ever.

Businesses gain from employee recognition in a variety of ways. Employees who believe their job is recognized and valued will have a greater sense of purpose and will be more engaged and productive.

HR departments may struggle to execute a successful employee appreciation program if they don’t have the necessary resources. Manually tracking results is time-consuming, and HR departments must mix this obligation with other responsibilities and tasks, some of which can be automated.

Fortunately, when organizations are aware of their objectives and have a well-defined company culture, the problems of integrating the correct technology can be lessened. 

Importance of Employee Recognition: 

Employees can understand that their organization values them and their contributions to the success of their team and the firm as a whole when they receive recognition. This is especially important as organizations expand and evolve. It helps employees feel secure in their worth to the organization, which motivates them to keep doing exceptional jobs.

For organizations, happier employees mean more harmony in the workplace, as well as higher retention rates. And, because these people are more committed and productive, your company’s income will rise. 

Role of Technology in Employee Recognition: 

Recruiting, onboarding, training, and administering leaves are just a few of the responsibilities of the HR department. They may automate manual activities with the correct software, giving them more time to focus on the important areas of their job. Screening software, for example, can help firms save time when it comes to calculating PTO, scheduling, and evaluating applicants’ CVs.

HR will be able to devote more time to more important responsibilities as a result of the added time. Teams can also use technology to digitize their wellness programs and reward programs. Employees will be able to gain a better understanding of their performance and determine how to achieve their objectives more rapidly as a result.

HR departments can use software to improve employee recognition activities beyond simply congratulating employees at the next team meeting. Companies can track their employees’ performance using segmented data with the correct tools. 

Technology allows firms to cultivate employee-to-employee appreciation in addition to employer-to-employee acknowledgment. When someone performs well, for example, team members can offer badges and compliments. Employees are motivated to do their best for the team and your company when they receive favorable feedback. 

HR Tech in Employee Recognition Program: 

Gamification – Gamification is a strategy for improving systems, services, organizations, and activities in order to generate experiences similar to those found in video games in order to encourage and engage users.

HR departments have a lot of alternatives when it comes to gamifying an employee recognition program. One possibility is to construct reward levels, with employees earning experience points (XP) as they go. They earn a prize if they reach a particular point.

Quizzes and videos can also be used to make employee appreciation program more gamified. 

Tracking Employee Performance – Employees generate complicated data at work, and technology is required to fully comprehend employee performance. HR technology makes it easier to track how close employees are to meeting their key performance indicators (KPIs) and to see how far they’ve come in their time with the organization.

Team managers can also leave feedback using HR technology. While they can do so in real-time chats, keeping a record implies that employees and other management can access this information anytime they need it. 

Automate Tasks That Waste Time – Every day, employees squander a significant amount of time on unproductive jobs. Two culprits are pointless meetings and emails.

Both of these issues are simple to resolve for HR personnel. Unnecessary meetings can be avoided with the use of communication technology such as Slack and Skype. Automating boilerplate memos and emails, as well as background checks, payrolls, and even recruiting recruits, is a fascinating possibility.  

HR processes have become a lot simpler thanks to technological advancements. Teams can use automation to focus on higher-level projects that require more attention.

Teams can also use HR technologies to decide who deserves to be rewarded. Because managers and HR staff can see the output for each employee, digitizing incentives programs removes any possible partiality.

Teams that integrate technology into their employee recognition plan will, in the end, be able to acknowledge achievement more effectively—and earn happier, more engaged employees as a result. 

ABOUT THE AUTHOR: ADVANTAGE CLUB

Advantage Club is a global provider of employee benefits. The platform serves to digitize all employee demands under one canopy through numerous employee engagement programs such as incentives, rewards & recognition, flexible and tax-saving modules. It now serves over 300 organizations in 70 countries and has over 10,000 brand partnerships.

 

Source Prolead brokers usa

big data analytics the role it plays in the banking and finance sector
Big Data Analytics: The Role it Plays in the Banking and Finance Sector

The finance industry generates a huge amount of data. Did you know big data in finance refers to the petabytes of structured and unstructured information that helps anticipate customer behaviors and create strategies that support banks and financial institutions? The structured information managed within an organization enables providing key decision-making insights. The unstructured information offers significant analytical opportunities across multiple sources leads that leads to increasing volumes.

The world generates a staggering 2.5 quintillion bytes of data every single day! Seeing the abundance of data we generate, most businesses are now seeking to use this data to their benefit, including the banking and finance sector. But how can they do that? With big data, of course. Here are some of its many benefits in the context of banks to help you better understand.

  1. Monitor customer preferences: Banks have access to a virtual goldmine of highly valuable data that is largely generated by customers themselves. As a result, financial institutions have a clear insight about what their customers want, which allows them to offer them better services, products, etc. that are in sync with their requirements.
  2. Prevent fraud: Since these systems typically involve the use of high-grade algorithms and analytics, banks can benefit immensely in the risk department. This is because such systems can identify even possibly fraudulent activities and deter malicious activities.

Allow us to now walk through some of the key challenges:

  1. Legacy systems: The mind-boggling amount of data involved in banking operations can easily stress out a bank’s legacy systems. This is why experts recommend upgrading one’s systems before integrating big data.
  2. Data quality management: Outdated, inaccurate, and incomplete data poses grave challenges, often spoiling the results of analytics, etc. Hence, banks must adopt processes to ensure data is reviewed before it enters the system.
  3. Consolidation: Banks add a humongous amount of data to their databases every single day, which is then channeled into different systems for better use. However, this can result in data silos and prevent the free flow of data within systems and teams. Hence, it is important to consolidate data immediately.

Finally, let us also take a look at some of its use cases:

  1. Enhanced user targeting: It is abundantly clear that big data can help banks understand their customers better among other things. One key way to use such insights is by applying them to marketing campaigns, ensuring they are better targeted and thus, primed to deliver better results.
  2. Tailored services: It is not news that today’s customers are, well, extremely finicky and demanding. Now, to win them over and ensure their loyalty, banks are putting big data to work so they can better understand customers, their requirements, etc. This information is then used to tailor the company’s offerings and services to achieve better sales and business results.
  3. Better cybersecurity: Given the abundance of data security risks and threats this sector faces every single day, it ought to come as no surprise that banks are now turning to big data for help. It typically involves the use of real-time machine learning along with predictive analytics on big data to identify risky behavior, reduce risk, etc.

There is not even a shred of doubt that digital transformation in the finance and banking sector has had a significant impact on the world. Thankfully, save for a few challenges, most of these changes have been for the better of, first, the customers and, then, the companies as well. To cut a long story short, any business in this sector that even hopes to thrive will do well to embrace big data and leverage its countless benefits for the business’ better future.

Source Prolead brokers usa

data labeling for machine learning models
Data Labeling for Machine Learning Models

Machine learning models make use of training datasets for predictions. And, thus labeled data is an important component for making the machines learning and interpret information. A variety of different data are prepared. They are identified and marked with labels, also often as tags, in the form of images, videos, audio, and text elements. Defining these labels and categorization tags generally includes human-powered effort.

Machine learning models which fall under the categories of supervised and unsupervised, pick the datasets and make use of the information as per ML algorithms. Data labeling for machine learning or training data preparation encompasses tasks such as data tagging, categorization, labeling, model-assisted labeling, and annotation.

Machine Learning Model Training

The majority of effective machine learning models use supervised learning, which uses an algorithm to translate input into output. Machine learning (ML) industries, such as facial recognition, autonomous driving, drones, and require supervised learning. And as a reason their reliability on the labeled data increases. In supervised learning, sometimes, machine learning models can also work to predict loss reduction. This instance is referred to as empirical risk minimization. For preventing such scenarios, data labeling and quality assurance must be vigorous.

In machine learning, as a norm, there are three main types of data sets that are utilized – dimensionality, sparsity, and resolution. And the data structure can also vary depending on the business problem. Textual data can be based on records, graphs, and order, etc. The human-in-the-loop uses labels to identify and mark predefined characteristics in the data. If the ML model requires to predict accurate results and also develop a suitable model, the dataset quality must be maintained. For example, labels in a data set identify whether the image has objects like a cat or a human, and also pinpoint the shape of the object. In a process known as “model training,” the machine learning model employs human-provided labels to understand the underlying patterns. As a result, you’ll have a trained model that you can use to generate predictions and develop a customized model based on fresh data.

Use Cases of Data Labeling in Machine Learning

Several use cases and AI tasks pertaining to computer vision, natural language processing, and speech recognition, computational instances need appropriate forms of data labeling.

1. Computer Vision: To produce your training dataset for a computer vision system, you must first label images, pixels, or key spots, or create a bounding box that completely encloses a digital image. Once the annotation is done, a training data set is produced and the ML model is trained depending on it.

2. Natural Language Processing: To create your training dataset for natural language processing, you must first manually pick key portions of text or tag the text with particular labels. Tag and justify labels in the text for the training dataset. Sentiment analysis, entity name identification, and optical character recognition or OCR are all done using natural language processing approaches.

3. Audio Annotation: Audio annotations are used for machine learning models which use sounds in a structured format for example – extraction of audio data and tags. NLP approaches are then applied to tagged sounds to interpret and obtain the learning data.

Maintaining Data Quality and Accuracy in Data Labeling

Normally, the training data is divided into three forms – training set, validation set, and testing set. All three forms are crucial for learning the model. Gathering the data is an important step to collating raw data and properly defining the attributes, in order to get them labeled.

Machine learning datasets must be accurate and of high quality. Accuracy refers to how accurate each piece of data’s labeling is in comparison to the business problem and what it aims to solve. Equally crucial are the tools which are used for labeling or annotation of data. AI platform data labeling services form the core for developing dependable ML models for artificial intelligence-based programs.

Cogito is one of the best data labeling companies, which offers quality training data for the machine learning industry. It makes use of labelbox model-assisted labeling,

The company has set the industry standard for quality and on-time delivery of AI and ML training data by partnering with world-class organizations. Cogito is well known in the AI community for providing reliable datasets for various AI models as the company fully supports data protection and privacy legislation. Cogito provides the clients with complete data protection rights that are governed by the norms and regulations of a GDPR and CCPA, ensuring total data privacy.

Source Prolead brokers usa

the growing impact of ai in financial services six examples
The Growing Impact of AI in Financial Services: Six Examples

Financial markets are constantly evolving, facing some of the biggest challenges in their history like full digitization of markets, technology-driven disruption, reduced client switching costs, etc. To cope up with these challenges and keep moving ahead, the industry timely embraced the new opportunities created by technology advancements like high-speed processing, the convergence of data ubiquity, AI software solutions, etc. 

As the industry continues to work towards digitizing and transforming for new growth and operations efficiencies, they are proactively focusing on innovating and differentiating by partnering up with AI development companies to deliver enhanced client experience. From AI trading to AI fraud detection, AI solutions are helping organizations redefine their operations more efficiently. By implementing such solutions, the firms are able to leverage their own data and deliver bottom-line results. 

Artificial Intelligence is being used in different industries and finance is no exception. One of the main advantages of AI solutions is their ability to work with huge databases and finance is one industry that can utilize AI solutions to its full potential. These solutions are already being implemented in areas such as insurance, banking, asset management, etc. 

Applications of AI in finance

AI solutions can be used in different ways. AI based chatbots can help financial firms communicate with their clients. It also serves as the basis for virtual assistants. With the help of AI solutions, organizations can also enable algorithmic trading based on machine learning algorithms and can be used for risk management, relationship management, and fraud detection, etc. 

AI in finance offers a lot of benefits but one of the main advantages is that it brings along endless automation opportunities for financial organizations. Automation can help organizations increase their productivity and operational efficiency. Moreover, in some situations, AI can replace manual efforts and helps in eliminating human biases and errors. AI solutions enhance data analysis. AI based machine learning solutions help in identifying patterns, therefore providing valuable insights and helping firms with better decision making. 

Below are some of the ways in which AI solutions are applied in finance: 

  • Automation: Automation enables organizations to enhance productivity and cut down their operating costs. Time-consuming tasks could be completed much faster. For example, AI can use character recognition to verify data automatically and generate reports based on certain parameters. It helps organizations in eliminating human errors and enables the employees to focus on more important tasks by providing them with extended bandwidth. Research suggests that AI helps organizations to save up to 70% of the costs associated with data entry and other repetitive tasks. 
  • Credit Decisions: AI solutions help banks analyze potential borrowers more accurately. They can quickly analyze countless factors and parameters that can impact the bank’s decision. AI solutions can use complex credit scoring approaches as compared to traditional systems. These solutions offer a higher degree of objectivity as these solutions are not biased which is essential especially in the financial sector. 
  • Trading: The trend of data-driven investments has been picking up pace in the last couple of years. AI and machine learning solutions are being used for algorithmic trading. These systems can analyze huge amounts of structured and unstructured data quickly. The speed at which these systems process data leads to quick decision making and transaction, leading to increased profit within the same time period. These algorithms make precise predictions based on a lot of historical data. They can test different trading systems, offering the traders insights for each of them before making a decision. It can also help in analyzing the long-term and short-term goals to provide recommendations on portfolio decisions. 
  • Risk Management: Risk management is another area wherein AI solutions help organizations. AI solution’s incredible processing power can help in handling risk management more efficiently than by human efforts. Algorithms can analyze the history of risks and can detect any potential problems in a timely manner. AI based solutions can analyze various financial activities in real-time, ignoring the current market environment if required. Organizations can opt for important parameters for their business planning and use them to get insights around forecasts and predictions for the future. 
  • Fraud Prevention: AI based solutions are proving to be effective in preventing and identifying fraud cases. Cybercriminals are quick on developing new tactics but with the help of AI based solutions, organizations can quickly identify and adapt to hackers’ strategies. These solutions are effective when it comes to dealing with credit card fraud situations. AI driven algorithms can analyze a client’s behavior, keep a track of their locations and identify their purchasing patterns, therefore they can detect if there are any unusual activities associated with a certain account. 
  • Personalized Banking: AI based solutions are one of the best when it comes to providing a personalized experience. Financial institutions can use AI based chatbots to offer timely help to their customers while minimizing the workload of their customer representatives. They can also adopt various voice-controlled virtual assistants for the personalized experience. These solutions are self-learning in nature i.e they identify patterns and learn on their own so they become more effective with time. There are a lot of solutions that offer personalized financial advice to their users. These systems use algorithms that can track their regular expenses, income, and purchasing habits to provide necessary financial suggestions based on the user’s financial goals. 

The financial market has been strongly influenced by technological advancements. We are operating in an environment where speed and convenience are the competing advantages in the industries, especially in the financial markets. The digital transformation has increased the competition like never before, therefore this industry is becoming increasingly volatile and competitive. To stay relevant in the given circumstances, organizations need to keep up with the latest technological advancements while partnering up with tech companies like AI development companies as these companies would help them gain a significant advantage by preparing them for the new opportunities that the tech offers.

Source Prolead brokers usa

understanding the complexity of metaclasses and their practical applications
Understanding the Complexity of Metaclasses and their Practical Applications

Metaprogramming is a collection of programming techniques which focus on ability of programs to introspect themselves, understand their own code and modify themselves. Such approach to programming gives programmers a lot of power and flexibility. Without metaprogramming techniques, we probably wouldn’t have modern programming frameworks, or those frameworks would be way less expressive. 

This article is an excerpt from the book, Expert Python Programming, Fourth Edition by Michal Jaworski and Tarek Ziade – A book that expresses many years of professional experience in building all kinds of applications with Python, from small system scripts done in a couple of hours to very large applications written by dozens of developers over several years. 

Metaclass is a Python feature that is considered by many as one of the most difficult things to understand in this language and thus avoided by a great number of developers. In reality, it is not as complicated as it sounds once you understand a few basic concepts. As a reward, knowing how to use metaclasses grants you the ability to do things that are not possible without them. 

Metaclass is a type (class) that defines other types (classes). The most important thing to know in order to understand how they work is that classes (so types that define object structure and behavior) are objects too. So, if they are objects, then they have an associated class. The basic type of every class definition is simply the built-in type class (see Figure 1). 

Figure 1: How classes are typed 

In Python, it is possible to substitute the metaclass for a class object with youy own type. Usually, the new metaclass is still the subclass of the type class (refer to Figure 2) because not doing so would make the resulting classes highly incompatible with other classes in terms of inheritance: 

 

Figure 2: Usual implementation of custom metaclasses 

Let’s take a look at the general syntaxes for metaclasses in the next section.  

The general syntax 

The call to the built-in type() class can be used as a dynamic equivalent of the class statement. The following is an example of a class definition with the type() call:  

def method(self):  

    return 1  

  

MyClass = type(‘MyClass’, (object,), {‘method’: method})  

This is equivalent to the explicit definition of the class with the class keyword: 

class MyClass:  

    def method(self):  

        return 1 

Every class that’s created with the class statement implicitly uses type as its metaclass. This default behavior can be changed by providing the metaclass keyword argument to the class statement, as follows: 

class ClassWithAMetaclass(metaclass=type):  

    pass  

The value that’s provided as a metaclass argument is usually another class object, but it can be any other callable that accepts the same arguments as the type class and is expected to return another class object. The call signature of metaclass is type(name, bases, namespace) and the meaning of the arguments are as follows: 

  • name: This is the name of the class that will be stored in the __name__ attribute 
  • bases: This is the list of parent classes that will become the __bases__ attribute and will be used to construct the MRO of a newly created class 
  • namespace: This is a namespace (mapping) with definitions for the class body that will become the __dict__ attribute 

One way of thinking about metaclasses is the __new__() method, but at a higher level of class definition. 

Despite the fact that functions that explicitly call type() can be used in place of metaclasses, the usual approach is to use a different class that inherits from type for this purpose. The common template for a metaclass is as follows: 

class Metaclass(type):  

    def __new__(mcs, name, bases, namespace):  

        return super().__new__(mcs, name, bases, namespace)  

  

    @classmethod  

    def __prepare__(mcs, name, bases, **kwargs):  

        return super().__prepare__(name, bases, **kwargs)  

  

    def __init__(cls, name, bases, namespace, **kwargs):  

        super().__init__(name, bases, namespace)  

  

    def __call__(cls, *args, **kwargs):  

        return super().__call__(*args, **kwargs)  

The namebases, and namespace arguments have the same meaning as in the type() call we explained earlier, but each of these four methods is invoked at the different stage of class lifecycle: 

  • __new__(mcs, name, bases, namespace): This is responsible for the actual creation of the class object in the same way as it does for ordinary classes. The first positional argument is a metaclass object. In the preceding example, it would simply be a Metaclass. Note that mcs is the popular naming convention for this argument. 
  • __prepare__(mcs, name, bases, **kwargs): This creates an empty namespace object. By default, it returns an empty dict instance, but it can be overridden to return any other dict subclass instance. Note that it does not accept namespace as an argument because, before calling it, the namespace does not exist yet. Example usage of that method will be explained later in the Metaclass usage section. 
  • __init__(cls, name, bases, namespace, **kwargs): This is not seen popularly in metaclass implementations but has the same meaning as in ordinary classes. It can perform additional class object initialization once it is created with __new__(). The first positional argument is now named cls by convention to mark that this is already a created class object (metaclass instance) and not a metaclass object. When __init__() is called, the class has been already constructed and so the __init__() method can do less than the __new__() method. Implementing such a method is very similar to using class decorators, but the main difference is that __init__() will be called for every subclass, while class decorators are not called for subclasses. 
  • __call__(cls, *args, **kwargs): This is called when an instance of a metaclass is called. The instance of a metaclass is a class object (refer to Figure 1); it is invoked when you create new instances of a class. This can be used to override the default way of how class instances are created and initialized. 

Each of the preceding methods can accept additional extra keyword arguments, all of which are represented by **kwargs. These arguments can be passed to the metaclass object using extra keyword arguments in the class definition in the form of the following code: 

class Klass(metaclass=Metaclass, extra=”value”):  

    pass  

This amount of information can be overwhelming at the beginning without proper examples, so let’s trace the creation of metaclasses, classes, and instances with some print() calls: 

class RevealingMeta(type):  

    def __new__(mcs, name, bases, namespace, **kwargs):  

        print(mcs, “__new__ called”)  

        return super().__new__(mcs, name, bases, namespace)  

  

    @classmethod  

    def __prepare__(mcs, name, bases, **kwargs):  

        print(mcs, “__prepare__ called”)  

        return super().__prepare__(name, bases, **kwargs)  

  

    def __init__(cls, name, bases, namespace, **kwargs):  

        print(cls, “__init__ called”)  

        super().__init__(name, bases, namespace)  

  

    def __call__(cls, *args, **kwargs):  

        print(cls, “__call__ called”)  

        return super().__call__(*args, **kwargs)  

Using RevealingMeta as a metaclass to create a new class definition will give the following output in the Python interactive session: 

>>> class RevealingClass(metaclass=RevealingMeta): 

…     def __new__(cls): 

…         print(cls, “__new__ called”) 

…         return super().__new__(cls) 

…     def __init__(self): 

…         print(self, “__init__ called”) 

…         super().__init__() 

…  

<class ‘RevealingMeta’> __prepare__ called 

<class ‘RevealingMeta’> __new__ called 

<class ‘RevealingClass’> __init__ called 

And when you try to create actual instance of RevealingClass you can get following output: 

>>> instance = RevealingClass() 

<class ‘RevealingClass’> __call__ called  

<class ‘RevealingClass’> __new__ called  

<RevealingClass object at 0x1032b9fd0> __init__ called 

Let’s take a look at the new Python 3 syntax for metaclasses. 

Metaclass Usage and Applications  

Metaclasses are great tool for doing unusual and sometimes wonky things. They give a lot of flexibility and power in modifying typical class behaviour. So, it is hard to tell what the common examples of their usage are. It would be easier to say that most usages of metaclasses are pretty uncommon. 

For instance let’s take a look at the __preprare__() method of every object type. It is responsible for preparing the namespace of class attributes. The default type for class namespace is plain dictionary. For years the canonical example of __prepare__() method was providing an collections.OrderedDict instance as a class namespace. Preserving order of attributes in class namespace allowed for things like repeatable object representation and serialization. But since Python 3.7 dictionaries are guaranteed to preserve key insertion order so that use case is gone. But it doesn’t mean that we can’t play with namespaces. 

Let’s imagine a following problem: we have a large Python code base that was developed over dozens of years and the majority of the code was written way before anyone in the team cared about coding standards. We may have for instance classes mixing camelCase and snake_case as method naming convention. If we care about consistency, we would be forced to spend a tremendous amount effort to refactor the whole code base into either of naming conventions. Or we could just use some clever metaclass that could be added on top of existing classes that would allow for calling methods in both of ways. We could write new code using new calling convention (preferably snake_case) while leaving old code untouched and waiting for gradual update. 

That’s the example of situation when the __prepare__() could be used! Let’s start by writing a dict subclass that automatically interpolates camelCase names into snake_case keys: 

from typing import Any 

import inflection 

 

class CaseInterpolationDict(dict): 

    def __setitem__(self, key: str, value: Any): 

        super().__setitem__(key, value) 

        super().__setitem__(inflection.underscore(key), value) 

Note: To save some work we use the inflection module that is not a part of standard library. is able to convert strings between various “string cases”. You can download it from PyPI using pip

         pip install inflection 

Our CaseInterpolationDict class works almost like an ordinary dict type but whenever it stores new value it saves it under two keys: original one and one converted to snake_case. Note that we used dict type as a parent class instead of recommended collections.UserDict. This is because we will use this class in metaclass __prepare__() method and Python requires namespaces to be dict instances. 

Now it’s time to write actual metaclass that will override the class namespace type. It will be surprisingly short: 

class CaseInterpolatedMeta(type): 

    @classmethod 

    def __prepare__(mcs, name, bases): 

        return CaseInterpolationDict() 

Since we are set up, we can now use the CaseInterpolatedMeta metaclass to create a dummy class with few methods that uses camelCase naming convention: 

class User(metaclass=CaseInterpolatedMeta): 

    def __init__(self, firstName: str, lastName: str): 

        self.firstName = firstName 

        self.lastName = lastName 

 

    def getDisplayName(self): 

        return f”{self.firstName} {self.lastName}” 

 

    def greetUser(self): 

        return f”Hello {self.getDisplayName()}!” 

Let’s save all that code in case_user.py file and start interactive session to see how User class behaves: 

>>> from case_class import User 

The first important thing to notice is the contents of the User.__dict__ attribute: 

>>> User.__dict__ 

mappingproxy({ 

    ‘__module__’: ‘case_class’, 

    ‘__init__’: <function case_class.User.__init__(self, firstName: str, lastName: str)>, 

    ‘getDisplayName’: <function case_class.User.getDisplayName(self)>, 

    ‘get_display_name’: <function case_class.User.getDisplayName(self)>, 

    ‘greetUser’: <function case_class.User.greetUser(self)>, 

    ‘greet_user’: <function case_class.User.greetUser(self)>, 

    ‘__dict__’: <attribute ‘__dict__’ of ‘User’ objects>, 

    ‘__weakref__’: <attribute ‘__weakref__’ of ‘User’ objects>, 

    ‘__doc__’: None 

}) 

The first thing that catches the eye is the fact that methods got duplicated. That was exactly what we wanted to achieve. The second important thing is the fact that User.__dict__ is of mappingproxy type. That’s because Python always copies contents of the namespace object to new dict when creating final class object. The mapping proxy also allows to proxy access to superclasses within the class MRO. 

So, let’s see if our solution works by invoking all of its methods: 

>>> user = User(“John”, “Doe”) 

>>> user.getDisplayName() 

‘John Doe’ 

>>> user.get_display_name() 

‘John Doe’ 

>>> user.greetUser() 

‘Hello John Doe!’ 

>>> user.greet_user() 

‘Hello John Doe!’ 

It works! We could call all the snake_case methods even though we haven’t defined them. For unaware developer that could look like almost like a magic! 

However, this is kind of magic that should be used very carefully. Remember that what you have just seen is a toy example. The real purpose of it what to show what is possible with metaclasses and just few lines of code. Learn more in the book Expert Python Programming, Fourth Edition by Michal Jaworski and Tarek Ziadé.

Summary 

In this article, we were first introduced to meta programming and eventually to the complex world of metaclasses. We explored the general syntax and practical usage of metaclasses. In the book, we further delve into advanced concepts of metaclasses pitfalls and the usage of __init__subclass__() method as alternative to metaclasses. 

About the Authors 

Michał Jaworski has more than 10 years of professional experience in writing software using various programming languages. Michał has spent most of his career writing high-performance and distributed backend services for web applications. He has served in various roles at multiple companies: from an ordinary software engineer to lead software architect. His beloved language of choice has always been Python. 

Tarek Ziadé is a software engineer, located in Burgundy, France. He works at Elastic, building tools for developers. Before Elastic, he worked at Mozilla for 10 years, and he founded a French Python User group, called AFPy. Tarek has also written several articles about Python for various magazines, and a few books in French and English.

Source Prolead brokers usa

self serve data prep tools can optimize sme business budgets and resources
Self-Serve Data Prep Tools Can Optimize SME Business Budgets and Resources

Small and medium sized businesses (SMEs) often find it difficult to balance the day-to-day need for data with the cost of employing data scientists or professional analysts to help with forecasting, analysis, data preparation and other complex analytical tasks.

A recent Gartner study found that by 2022, data preparation will become a critical capability in more than 60% of data integration, analytics/BI, data science, data engineering and data lake enablement platforms. If this assumption is correct, SMEs will have to find a way to satisfy the need for advanced analytics and fact-driven decision-making, if these businesses are going to grow and compete.

Data is everywhere in modern organizations and small and medium sized businesses are no exception! The tasks involved in gathering and preparing data for analysis are just the first steps. To make the best use of that data, the organization must have advanced analytics tools that can help them analyze and find patterns and trends in data and build analytics models. But these steps can be labor intensive and, without a suitable self-serve data preparation tool, the organization will have to employ the services of professional data scientists to get the job done.

Data prep and manipulation includes data extraction, transformation and loading (ETL) and shaping, reducing, combining, exploring, cleaning, sampling and aggregating data. With a targeted self-serve data preparation tool, the midsized business can allow its business users to take on these tasks without the need for SQL skills, ETL or other programming language or data scientist skills.

Augmented analytics features can help an SME organization to automate and enhance data engineering tasks and abstract data models, and use system guidance to quickly and easily prepare data for analysis to ensure data quality and accurate manipulation. With the right self-serve data preparation tools, users can explore data, use auto-recommendations to visualize the data in a way that is appropriate for a particular type of data analysis and leverage natural language processing (NLP) and machine learning to get at the data using simple search analytics that are familiar and commonly used in Google and other popular search techniques.

Because these sophisticated features are built with intuitive guidance and auto-recommendations, the user does not have to guess at how to prepare, visualize or analyze the data so results are accurate, easy to understand and suitable for sharing and reporting purposes. As small and medium sized organizations face the challenges of an ever-changing market and customer expectations, it will be more critical than ever to optimize business and data management and to make data available for strategic and day-to-day decisions. To manage budgets and schedules, SMEs will have to achieve more agility and flexibility and look to the business user community to increase data literacy and embrace business analytics.

Source Prolead brokers usa

new study uses dnn to predict 99 of coronary heart disease cases
New Study uses DNN to Predict 99% of Coronary Heart Disease Cases

  • Cardiovascular diseases are the primary cause of global deaths.
  • New model detects coronary heart disease with almost 99% accuracy.
  • DNN with hidden layers shows more accuracy than other models.

According to the World Health Organization (WHO), cardiovascular diseases (CVDs) are the leading cause of death globally, killing 17.9 million people in 2019 [1]. The WHO risk models identified many different variables as risk factors for CVDs, including the key predictor variables: age, blood pressure, body mass index, cholesterol, and tobacco use.  Historically, this potpourri of factors made CVDs almost impossible to predict with any meaningful accuracy. A new study by Kondeth Fathima and E. R. Vimina [2], published in Intelligent Sustainable Systems Proceedings of ICISS 2021, used Deep Neural Networks (DNNs) with four Hidden Layers (HDs) to predict CVDs with an impressive 99% accuracy.  

What is a DNN with Hidden Layers?

Neural network models have come to the forefront in recent years, gaining popularity because of their exceptional prediction capabilities. Many different deep learning techniques have been developed, including Convolutional Neural Networks  (CNNs) —used extensively for  object recognition and classification—and Long Short-Term Memory Units (LSTMs), widely used to detect anomalies in network traffic. This new study used a Deep Neural Network (DNN), known for their robustness to low and high data variations, generalizability to a wide range of applications, and scalability for additional data.  

DNNs can be single- or multi-layered and are defined as “an interconnected assembly of processing elements that act upon a function” [2]. The additional computational layers in multi-layered DNNs are called Hidden Layers (HLs); HLs repeat a process through many cycles.  A neural network model with hidden layers can handle increasingly complex information, making it an ideal choice for analyzing data with multiple features—like data on cardiovascular disease. When modeling the intricacies of CVD risk-factors, more hidden layers give better results than one with fewer layers. The goal of the study was to find the DNN with the optimal number of hidden layers—the one giving the best accuracy for predicting cardiovascular disease.

Methodology

The study authors used two datasets from the University of California at Irvine’s machine learning repository [3], Statlog and Cleveland. Both of these data sets are known for their data source reliability. After using Exploratory Data Analysis on the data, the researchers chose the best model based on accuracy performance on the two datasets.

Three different neural network models were studied, each with a different number of layers and neurons. After experimenting with various numbers of hidden layers, the researchers chose one with one input layer (IL), four HLs, and one output layer (OL).

Synthetic Minority Oversampling Technique (SMOTE) increased and balanced the number of cases in the imbalanced dataset, which contained disproportionate cases of healthy and unhealthy cases. Mean imputation replaced the missing data, and the datasets were divided into a training set (70%) and testing set (30%) containing equal proportions of healthy and unhealthy cases. The weights of the 13 features were optimized using gradient descent in neural networks, and the data was scaled using standardization.

Results

Different metrics used for evaluating the models, including accuracy, sensitivity, specificity, F1 score, misclassification, ROC, and AUC. The result was a four-HL DNN that detected coronary heart diseases with “promising results”. The selected model gave accuracies of 98.77 on the Statlog dataset and 96.70 for the Cleveland dataset.

References

DNN Image (Top): Adobe Stock / Creative Cloud.

4-HL DNN Model by Author (Based on Kondeth Fathima and E. R. Vimina’s original Fig. 1). Background: Adobe Stock / Creative Cloud.

[1] Cardiovascular diseases (CVDs)

[2] Heart Disease Prediction Using Deep Neural Networks: A Novel Approach

[3] UCI Machine Learning Repository

Source Prolead brokers usa

reframing data management data management 2 0
Reframing Data Management:  Data Management 2.0

A cartoon making its way around social media asks the provocative question “Who wants clean data?” (Everyone raises their hands) and then asks, “Who wants to CLEAN the data?” (Nobody raises their hands).  I took the cartoon one step further (apology for my artistic skills) and asked, “Who wants to PAY for clean data?” and shows everyone running for the exits (Figure 1).

Figure 1: Today’s Data Management Reality

Why does everyone run for the exits when asked to pay for data quality, data governance, and data management?  Because we do a poor job of connecting high-quality, complete, enriched, granular, low-latency data to the sources of business and operational value creation.

Data is considered the world’s most valuable resource and providing compelling financial results to organizations focused on exploiting the economics of data and analytics (Figure 2).

Figure 2: Industry-leading Data Monetization Organizations

Yet, most business executives are still reluctant to embrace the fundamental necessity of Data Management and fund it accordingly. If data is the catalyst for the economic growth of the 20th century, then it’s time we reframe how we view data management.  It’s time to talk about Data Management 2.0.

The Data Management Association (DAMA) has long been the data management champion. DAMA defines data management as “the planning, oversight, and control over the management and use of data and data-related sources”. DAMA is instrumental in driving data management development of procedures, practices, policies, and architecture (Figure 3).

Figure 3: DAMA Data Management Framework visualized by Denise Harders

The DAMA Data Management Framework is great for organizations seeking to understand how to manage their data. However, if data is “the world’s most valuable resource”, then we must re-invent data management into a business strategy.  We must help organizations understand how best to monetize or derive value from the application of data to their business (Figure 4).

Figure 4: Transforming Data Management

Before exploring the Laws of Data Management 2.0, let me define “Data Monetization”:

Data Monetization is the application of data to the business to drive quantifiable financial value.

While some organizations can sell their data, for the majority of organizations data monetization (or insights monetization) is about the application of the data to the organization’s top use cases to drive quantifiable financial value. Or as Doug Laney, author of the seminal book “Infonomics: How to Monetize, Manage, and Measure Information as an …” stated:

“If you are not quantifying the financial value that your organization derives from the use of data, then you are not doing data monetization”

Law #1:  Data is of no value in of itself

Data possesses potential value, but in of itself, provides zero realized value. As I discussed in “Introducing the 4 Stages of Data Monetization”, data in Stage 1 is a cost to be minimized.  Data in Stage 1 is burdened with the increasing costs associated with the storage, management, protection, and governance of the data, as well as potential regulatory and compliance costs, liabilities, and fines associated with not properly managing or protecting one’s data (Figure 5).

Figure 5: 4 Stages of Data Monetization

Data Management 2.0 provides a more holistic methodology that doesn’t just stop at managing data but enables the application of data to the organization’s most important use cases to drive quantifiable financial value.

Law #2:  Not all data is off equal value

Many data management organizations waste precious resources (and business stakeholder street cred) by treating all the data the same way.  Fact: some data is more important that other data in helping to predict and optimize customer engagement, product performance, and business operations. 

To determine which data elements are most important, Data Scientists can apply analytic techniques like Principal Component Analysis (PCA) and Random Forest to quantify the importance of a particular data element (or feature, something that I’ll discuss in my next blog) in optimizing the organization’s key use cases such as customer attrition, predictive product maintenance, unplanned operational downtime, improved healthcare results, or surviving the sinking of the Titanic (Figure 6).

Figure 6: Factors Predicting Titanic Sinking Survival

Data Management 2.0 operationalizes business stakeholder collaborate to identify, validate, value, and prioritize the use cases that deliver organizational value, and identify and triage the KPIs and metrics against which value delivery will be measured.

Law #3:  One cannot ascertain the value of their data in isolation of the business

To identify which data variables are most important to the business, data management must start by understanding how the organization creates and measures value creation.  This conversation starts with an organization’s business and operational intent; that is, what is the organization trying to accomplish from a business and operational perspective over the next 12 to 18 months, and what are the measures or KPIs against which progress and success will be measured.

Data Management 2.0 reframes how organization’s approach the application of data to the business by understanding how organizations create value (and where and how data can help create value) instead of starting with data (and hoping that data finds its way to value). For more on how to do that, check out my book “The Art of Thinking Like a Data Scientist” which provides an 8-step, collaborative process for engaging the business stakeholders in identifying, validating, valuing, and prioritizing the organization’s most important business and operational use cases (Figure 7).

Figure 7: The Art of Thinking Like a Data Scientist

Law #4:  Turning everyone into Data Engineers is not practical and not scalable

Finally, asking business stakeholders to manage their own data sources is impractical and dangerous. It opens the door to random, orphaned data management processes that may address the data and analytic tactical needs, but at the expense of data and analytics’ strategic, economic value.

Data Management 2.0 empowers the entire organization with the capabilities for building, sharing, and refining the organization’s data and analytics capabilities and assets that enables organizations to unleash the business or economic value of their data.

If we believe that data is the new oil – that data will be the catalyst for the economic growth in the 21st century – then we need to spend less time and investments trying to manage data and dramatically increase the time and investments to monetize data. That will require organizations to expand their data management capabilities to support the sharing, re-using and continuous refinement of the data and analytics assets to derive and drive new sources of customer, product, and operational value.

Damn it feels good to be a data gangsta!

Source Prolead brokers usa

a taxonomy of transformer based pre trained language models tptlm
A taxonomy of Transformer based pre-trained language models (TPTLM)

We follow on from our two previous posts 

Opportunities and Risks of foundation models

Understanding self supervised learning

In this post, we understand the taxonomy of TPTLM – Transformer based pre-trained language models

The post is based on a paper which covers this topic extensively:

AMMUS : A Survey of Transformer-based Pretrained Models in Natural …

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, and Sivanesan Sa…

Transformer based pre-trained language models (TPTLM) are a complex and fast growing area of AI – so I recommend this paper as a good way to understand and navigate the landscape

We can classify TPTLM from four perspectives

  • Pretraining Corpus
  • Model Architecture
  • Type of SSL (self-supervised learning) and
  • Extensions

Pretraining Corpus-based models

General pretraining: Models like GPT-1 , BERT etc are  pretrained on general corpus. For example, GPT-1

is pretrained on Books corpus while BERT and UniLM are pretrained on English Wikipedia and Books corpus.

This form of training is more general from multiple sources of information

Social Media-based: you could train on models using social media

Language-based: Models could be trained on languages either monolingual or multilingual.

Architecture

TPTLM  could be classified based on their architecture.  A T-PTLM can be pretrained using a stack of encoders or decoders or both.

Hence, you could have architectures based on

  • Encoder-based
  • Decoder-based
  • Encoder-Decoder based

Self supervised learning – SSL is one of the key ingredients in building T-PTLMs.

A T-PTLM can be developed by pretraining using Generative, Contrastive or Adversarial, or Hybrid SSL. Hence, based on SSLs you could have

  • Generative SSL
  • Contrastive SSL
  • Adversarial SSL
  • Hybrid SSL

Based on extensions, you can classify TPTLMs according to the following categories

  • Compact T-PTLMs: aim to reduce the size of the T-PTLMs and make them faster using a variety of model compression techniques like pruning, parameter sharing, knowledge distillation, and quantization.
  • Character-based T-PTLMs: CharacterBERT uses CharCNN+Highway layer to generate word representations from character embeddings and then apply transformer encoder layers. ex AlphaBERT
  • Green T-PTLMs: focus on environmentally friendly methods
  • Sentence-based T-PTLMs: extend T-PTLMs like BERT to generate quality sentence embeddings.
  • Tokenization-Free T-PLTMs: avoid the use of explicit tokenizers to split input sequences to cater for languages such as Chinese or That that do not use white space or punctuations as word separators.
  • Large Scale T-PTLMs: Performance of T-PTLMs is strongly related to the scale rather than the depth or width of the model. These models aim to increase the parameters of the model
  • Knowledge Enriched T-PTLMs: T-PTLMs are developed by pretraining over large volumes of text data. During pretraining, the model learns
  • Long-Sequence T-PTLMs: self-attention variants like sparse self attention and linearized self-attention are proposed to reduce its complexity and hence extend T-PTLMs to long input sequences
  • Efficient T-PTLMs: ex DeBERTa which improves the BERT model using disentangled attention mechanism and enhanced masked decoder.

This is a complex area and I hope the taxonomy above is useful. The paper I referred provides more and makes a great effort at explain such a complex landscape

The post is based on a paper which covers this topic extensively: (also image source from the paper)

AMMUS : A Survey of Transformer-based Pretrained Models in Natural …

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, and Sivanesan Sa…

Source Prolead brokers usa

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA Skip to content