Curriculum

Artificial intelligence (AI) and machine learning

Mini Diploma in Artificial Intelligence and Machine Learning

0/8

Text lesson

Week 2: Self-Study Notes

Study-Friendly Structure

Each section includes:

Key Concepts (Theory made simple).
Practical Steps (Hands-on examples).
Quick Exercises (Practice questions to reinforce learning).

English Version

ML Workflow and Algorithms

1. Data Preprocessing

Data preprocessing prepares raw data for ML models by cleaning, transforming, and structuring it.

Key Concepts:

Handling Missing Values: Replace missing values with mean, median, or a placeholder.
Data Normalization: Scale features to ensure uniform ranges (e.g., Min-Max scaling).
Encoding Categorical Data: Convert categories to numerical values (e.g., one-hot encoding).

Practical Steps:

Import necessary libraries:
python
import pandas as pd from sklearn.preprocessing import MinMaxScaler
Handle missing values:
python
data = {'Age': [25, 30, None, 22], 'Salary': [50000, 60000, None, 40000]} df = pd.DataFrame(data) df['Age'].fillna(df['Age'].mean(), inplace=True) print(df)

Quick Exercise:

Create a dataset with 3 columns and introduce missing values. Fill those values using the mean.

2. Supervised Learning Algorithms

Supervised learning uses labeled data to predict outcomes.

Key Algorithms:

Linear Regression: Predicts continuous values (e.g., house prices).
Logistic Regression: Predicts binary outcomes (e.g., spam detection).

Practical Steps:

Linear Regression Example:
python
from sklearn.linear_model import LinearRegression import numpy as np X = np.array([[1], [2], [3]]) # Features y = np.array([1, 2, 3]) # Target model = LinearRegression() model.fit(X, y) print("Prediction for 4:", model.predict([[4]]))
Logistic Regression Example:
python
from sklearn.linear_model import LogisticRegression X = [[1], [2], [3]] y = [0, 1, 1] model = LogisticRegression() model.fit(X, y) print("Prediction for 4:", model.predict([[4]]))

Quick Exercise:

Create a simple dataset and train a Linear Regression model to predict test scores based on hours studied.

3. Model Evaluation

Model evaluation ensures your ML model performs well on unseen data.

Key Metrics:

Accuracy: Percentage of correct predictions.
Precision & Recall: Used for imbalanced datasets.
F1 Score: Balance between precision and recall.

Practical Steps:

Calculate accuracy:
python
from sklearn.metrics import accuracy_score y_true = [1, 0, 1, 1] y_pred = [1, 0, 1, 0] print("Accuracy:", accuracy_score(y_true, y_pred))

Quick Exercise:

Implement a confusion matrix for a small dataset using scikit-learn.

සිංහල අනුවාදය

ML ක්‍රියා ප්‍රවාහය සහ අලගෝරිදම්

1. දත්ත පෙර සැකසීම

දත්ත පෙර සැකසීම යනු දත්ත පිරිසිදු කිරීමේ, වෙනස් කිරීමේ, හා සකසන ක්‍රියාවලියකි.

ප්‍රධාන අදහස්:

අස්වැන්න අස්ථානගත වීම: අස්වැන්න මධ්‍යය, මධ්‍යකය, හෝ වෙනත් අගයන් මඟින් පිරවීම.
දත්ත සාමාන්‍යකරණය: විශේෂාංග මිනිමට හා මැක්සිමම මට්ටම් අතරට මාපාංකනය කිරීම.
ප්‍රවර්ග ලේබල්කරණය: කාණ්ඩ නිකුත් අගයන්ට පරිවර්තනය කිරීම (උදා: One-Hot Encoding).

ප්‍රායෝගික පියවර:

අවශ්‍ය පුස්තකාල ආයාත කරන්න:
python
import pandas as pd from sklearn.preprocessing import MinMaxScaler
අස්වැන්න පිරවීම:
python
data = {'Age': [25, 30, None, 22], 'Salary': [50000, 60000, None, 40000]} df = pd.DataFrame(data) df['Age'].fillna(df['Age'].mean(), inplace=True) print(df)

දඩුකාරකම:

ත්‍රිස්තුන්ඩ දත්ත සකසන්න. අස්වැන්න පිරවීමේදී මධ්‍යය භාවිතා කරන්න.

2. පරීක්ෂණාත්මක ඉගෙනුම් අලගෝරිදම්

පරීක්ෂණාත්මක ඉගෙනුම ලේබල් දත්ත මත අනාවැකිය මැවීමට උපකාරී වේ.

ප්‍රධාන අලගෝරිදම්:

Linear Regression: දිගු අගයන් අනාවැකිය.
Logistic Regression: ද්විත්ව ප්‍රතිඵල අනාවැකිය.

ප්‍රායෝගික පියවර:

රේඛීය ප්‍රතිසන්ධානය:
python
from sklearn.linear_model import LinearRegression import numpy as np X = np.array([[1], [2], [3]]) y = np.array([1, 2, 3]) model = LinearRegression() model.fit(X, y) print("Prediction for 4:", model.predict([[4]]))

දඩුකාරකම:

කුඩා දත්ත සකසන්න හා Linear Regression ආදර්ශයක් පුහුණු කරන්න.

3. ආදර්ශ අගය කිරීම

ආදර්ශ අගය කිරීමෙන් මගින් ආදර්ශය මිනුම් කරයි.

ප්‍රධාන මිනුම්:

නිවැරදිභාවය: නිවැරදි අනාවැකිය.
Precision & Recall: ආසන්න හා නිවැරදි හඳුනාගැනීම්.
F1 Score: Precision හා Recall අතර සමතුලිතයක්.

ප්‍රායෝගික පියවර:

නිවැරදිභාවය ගණනය කරන්න:
python
from sklearn.metrics import accuracy_score y_true = [1, 0, 1, 1] y_pred = [1, 0, 1, 0] print("Accuracy:", accuracy_score(y_true, y_pred))

දඩුකාරකම:

scikit-learn භාවිතයෙන් කුඩා දත්ත සදහා confusion matrix එකක් සාදන්න.