Machine Learning - Hypothesis

Machine Learning - Hypothesis

LAVI

Hypothesis

H is a set of hypotheses, h is a member of H

Hypothesis h

一種想法或提議的解釋
透過實驗調查驗證

Hypothesis space H

所有可能假設的集合
透過某些表示法定義,例如線性函數
會透過演算法尋找 H 之中最符合該資料集觀察出的 h

  1. Most Specific Hypotheses
    與觀察到的訓練結果一致,如果再將範圍縮小,則會變得不一致
  2. Most General Hypotheses
    與觀察到的訓練結果一致,如果再將範圍擴大,則會變得不一致

General Boundary:最寬鬆的 version space,只要不包含 negative example 的空間都屬於
Specific Boundary:最嚴謹的 version space,只有剛好包含所有 positive example 的空間屬於

Version Space

假設有一 H 和給定的訓練資料集,version space 是所有與該資料集一致的所有 H 的子集合,也就是上圖中長方形的區域

X: The Input Instance Space

Target function:
Six input attributes:

Feature Category Number of feature category
Price 3
Engine Power 2
Maintenance 2
Doors ${2, 4&mores}$ 2
Trunk Size 2
Safety 2

Input attributes there in the data set is 6
Size of the Input instance space
Syntactically distinct number (每個 attribute 都加上兩個值:”” and “”.)
Semantically distinct number (每個 attribute 都加上一個值:””,並額外加上一個 1,表示所有值都不選擇的狀況)

Don’t care value:
No value allowed:

General-to-Specific Ordering over H

Let h1 = <?, ?, ?, 4&more, ?, High>
h2 = <?, Moderate, ?, 4&more, ?, High>

then any example that satisfies h2 also satisfies h1
h1 is more_general_than_or_equal_to h2, denoted by h1 >= h2 h2 is more_specific_than_or_equal_to h1

Candidate-Elimination Algorithm

幫助 version space 的 boundary more specific 和 more general

  • 找到 Positive example 時會讓 specific boundary more general
  • 找到 Negative example 時會讓 general boundary more specific

Reference

  • 黃貞瑛老師的機器學習課程 - Concept Learning