**Identify **a machine learning problem

**Use **basic machine learning techniques

**Think **about your data/results

Machine learning is a part of data science that deals with programming the computer in such a way that it automatically learn and grow with experience. It means recognition and understanding the input dataset. It caters to all the decisions based on all possible inputs.

To fix this problem, some algorithms are developed with **R programming**. These algorithms build knowledge from specific data and statistical analysis, probability theory, logic, combinatoric optimization, search, reinforcement learning, and Data control.

The developed algorithms form the basis of R programming such as:

- Regression Model
- Decision Tree
- Classification
- Cluster
- K-Nearest Neighbors

### Regression Model:

It is a Statistical approach to find the relationship between two or more variable. It is part of supervise learning. It maintains the relationship between dependent and independent variable. It is used to build regression model with a minimum square error of the fitted value.

#### Types of Regression model

**Simple Linear Regression:** It has two main objectives stating establishment between two variables. Ex: Income and spending, Higher income will spend more.

**Multiple Linear Regression:** It maintains the relationship between one or more predictor variables.

**Non-Linear Regression:** It maintains the relationship between predictor and response variable.

**Example:** Simple Linear Regression

y = ax + b

Where

y is the response variable.

x is the predictor variable.

a and b are constants which are called the coefficients.

# Value in Height

x = c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

# Value in Weight

y = c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

SLRModel = lm(y~x)

print(SLRModel)

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-38.4551 0.6746

print(summary(SLRModel))

### Decision Tree:

It occurs in the Classification problem. It works for both like categorical and continuous input and output variables.

**Tasks of classification:**

- Automatically assign a class to observations with features.
- Observation: vector of features, with a class.
- Automatically assign a class to new observation with features, using previous observations.
- Binary classification: two classes.
- Multiclass classification: more than two classes.

**Example**

A dataset consisting of persons

Features: age, weight and income

Class: happy or not happy

Multiclass: happy, satisfied or not happy

**The decision tree Example**

Suppose you’re classifying patients as sick or not sick.

### Classification:

It is the statistical problem of identifying a set of categories in a new observation, on the basis of a training group of data containing observations (or instances) which is category membership is known.

**Classification Problem**

**Accuracy** and **Error**

System is right or wrong

Accuracy goes up when Error goes down

Accuracy = correctly classified instances/total amount of classified instances

Error = 1 – Accuracy

**Example:**

Squares with 2 features: small/big and solid/dotted

Label: colored/not colored

Binary classification problem

**Limits of accuracy**

- Classifying very rare heart disease
- Classify all as negative (not sick)
- Predict 99 correct (not sick) and miss
- 1 Accuracy: 99%
- Bogus… you miss every positive case!

**Goal: **predict category of new observation

**Classification Applications**

**Medical ****Diagnosis:- **Sick and Not Sick

**Animal** **Recognition:- **Dog, Cat and Horse

**Important: **Qualitative Output, Predefined Classes

### Cluster:

Clustering is a technique to club a set of objects in such a way that the objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to that is in other groups (clusters).

It is the main task to analyze data, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bio-informatics, data compression, data mining and system graphics.

It is the step of dividing the data points into a number of sets such that data points in the same sets are more similar to other data points in the same group than those in other sets. In simple words, the part is to segregate sets with similar traits and assign them into clusters.

There are the following cluster properties:

- Grouping objects in clusters
- Collection of objects
- Similar within cluster
- Dissimilar between clusters
- Grouping objects in clusters
- No labels: unsupervised
- Classification Plenty possible clusterings
- Similar within cluster
- Dissimilar between clusters

**Example:**

Grouping similar animal photos

No label information

Need distance metric between points

No right or wrong

Plenty possible clusterings

Performance measure consists of 2 elements

Similarity within each cluster (Bottom to up)

Similarity between clusters ( Top to bottom)

### K-Nearest Neighbors:

K-Nearest neighbor is an algorithm that stores all available cases and classifies new cases by a large number of votes of its k neighbors. This algorithm apartheid unlabeled data points into well-defined sets.

K-nearest neighbor algorithm adds to its basic algorithm that after the distance of the new point to all stored data points has been calculated, the distance values are sorted and the K-nearest neighbors are determined. The labels of these neighbors are gathered and a majority vote or weighted vote is used for classification or regression purposes.

- No real model like
**decision****t****r****ee** **C****ompa****r****e**unseen instances to training set**P****r****edict**using the**c****omparison**of**unseen****data**and the**t****r****aining****set**

Form of **instan****c****e****-b****a****sed** **learning
**Simplest form: 1-Nearest Neighbor or Nearest Neighbor

**Example:**

2 **features**: X1 and X2

**Class**: red or blue

Binary classification

Save complete training set

Given: unseen observation with features ** X = **(1.3, -2)

Compare training set with new observation

Find

**closest**observation — nearest neighbor — and assign

**same class**

just Euclidean distance, nothing fancy

**k **is the amount of neighbors If **k **= 5

Use 5 most similar observations (neighbors)

Assigned class will be the most represented class within the 5 neighbors.

## Summary

Machine learning is taking over the world-wide. These are the basic algorithm in machine learning. It is a growing need between organizations for professionals to know the inputs and outputs of machine learning.

These algorithms help to statistical problems using concepts like regression, clustering, classification, and prediction. These are mainly used for predictions, classification and verification basic of the continuous or categorical dataset.

It should be useful to everyone from business executives to ML practitioners. It covers virtually all aspects of machine learning (and many related fields) at a high level and should serve as a sufficient introduction or reference to the terminology, concepts, tools, considerations, and techniques in the field.

Nice blog about Machine Learning as well as R programming