Machine Learning - Hypothesis

Hypothesis

H is a set of hypotheses, h is a member of H

Hypothesis h

一種想法或提議的解釋
透過實驗調查驗證

Hypothesis space H

所有可能假設的集合
透過某些表示法定義，例如線性函數
會透過演算法尋找 H 之中最符合該資料集觀察出的 h

Most Specific Hypotheses
與觀察到的訓練結果一致，如果再將範圍縮小，則會變得不一致
Most General Hypotheses
與觀察到的訓練結果一致，如果再將範圍擴大，則會變得不一致

General Boundary：最寬鬆的 version space，只要不包含 negative example 的空間都屬於
Specific Boundary：最嚴謹的 version space，只有剛好包含所有 positive example 的空間屬於

Version Space

假設有一 H 和給定的訓練資料集，version space 是所有與該資料集一致的所有 H 的子集合，也就是上圖中長方形的區域

X: The Input Instance Space

Target function:
Six input attributes:

Feature	Category	Number of feature category
Price		3
Engine Power		2
Maintenance		2
Doors	${2, `4&mores`}$	2
Trunk Size		2
Safety		2

Input attributes there in the data set is 6
Size of the Input instance space
Syntactically distinct number （每個 attribute 都加上兩個值：”” and “”.）
Semantically distinct number （每個 attribute 都加上一個值：””，並額外加上一個 1，表示所有值都不選擇的狀況）

Don’t care value:
No value allowed:

General-to-Specific Ordering over H

Let h1 = <?, ?, ?, 4&more, ?, High>
h2 = <?, Moderate, ?, 4&more, ?, High>

then any example that satisfies h2 also satisfies h1
h1 is more_general_than_or_equal_to h2, denoted by h1 >= h2 h2 is more_specific_than_or_equal_to h1

Candidate-Elimination Algorithm

幫助 version space 的 boundary more specific 和 more general

找到 Positive example 時會讓 specific boundary more general
找到 Negative example 時會讓 general boundary more specific

Reference

黃貞瑛老師的機器學習課程 - Concept Learning