sklearn.cluster.Kmeans解析

sklearn.cluster.KMeans(n_clusters=8,init='k-means++',n_init=10,

        max_iter=300, tol=0.0001, precompute_distances='auto',verbose=0,

        random_state=None, copy_x=True,n_jobs=1,algorithm='auto')

n_clusters: 生成类别数, int, optional, default: 8.

init: 初始化方法, 默认为‘k-means++,可选{‘k-means++’, ‘random’ or an ndarray}.

max_iter: 最大循环次数, int, default: 300.

tol: 判断收敛参数, float, default: 1e-4.

precompute_distances: 预先计算距离并存储,可选{‘auto’, True, False},其中 ‘auto’:如果 n_samples * n_clusters > 12 million则不计算。

verbose:Verbosity模式, int, default 0

random_state: int, RandomState instance or None, optional, default: None (random number generator is the RandomState instance used by np.random)

copy_x: boolean, default True (the original data is not modified)

n_jobs: 设置parallel

algorithm : “auto”, “full”(classical EM-style) or “elkan”(triangle inequality), default=”auto”(chooses “elkan” for dense data and “full” for sparse data)

Examples:

from sklearn.cluster import KMeans

import numpy as np

X = np.array([[0, 0], [0, 2], [-1, 1], [1, 1],

                        [4, 0], [4, 2], [3, 1], [5, 1]])

kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

print(kmeans.labels_)

# [1 1 1 1 0 0 0 0]

print(kmeans.predict([[0, -1], [4, 4]]))

# [1 0]

print(kmeans.cluster_centers_)

# [[4. 1.]

# [0. 1.]]