forked from fengdu78/Coursera-ML-AndrewNg-Notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path13 - 2 - K-Means Algorithm (13 min).srt
1721 lines (1377 loc) · 30.1 KB
/
13 - 2 - K-Means Algorithm (13 min).srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,300 --> 00:00:02,220
In the clustering problem we are
在聚类问题中
(字幕整理:中国海洋大学 黄海广,[email protected] )
2
00:00:02,360 --> 00:00:03,630
given an unlabeled data
我们有未加标签的数据
3
00:00:03,950 --> 00:00:05,040
set and we would like
我们希望有一个算法
4
00:00:05,200 --> 00:00:06,480
to have an algorithm automatically
能够自动的
5
00:00:07,320 --> 00:00:08,700
group the data into coherent
把这些数据分成
6
00:00:09,340 --> 00:00:11,000
subsets or into coherent clusters for us.
有紧密关系的子集或是簇
7
00:00:12,380 --> 00:00:14,160
The K Means algorithm is by
K均值 (K-means) 算法
8
00:00:14,310 --> 00:00:15,860
far the most popular, by
是现在最为广泛使用的
9
00:00:16,090 --> 00:00:17,410
far the most widely used clustering
聚类方法
10
00:00:17,780 --> 00:00:19,380
algorithm, and in this
那么在这个视频中
11
00:00:19,550 --> 00:00:20,320
video I would like to tell
我将会告诉你
12
00:00:20,570 --> 00:00:23,400
you what the K Means Algorithm is and how it works.
什么是K均值算法以及它是怎么运作的
13
00:00:27,000 --> 00:00:29,310
The K means clustering algorithm is best illustrated in pictures.
K均值算法最好用图来表达
14
00:00:29,960 --> 00:00:30,770
Let's say I want to take
如图所示
15
00:00:31,080 --> 00:00:32,330
an unlabeled data set like
现在我有一些
16
00:00:32,490 --> 00:00:34,040
the one shown here, and I
没加标签的数据
17
00:00:34,100 --> 00:00:36,450
want to group the data into two clusters.
而我想将这些数据分成两个簇
18
00:00:37,710 --> 00:00:38,740
If I run the K Means clustering
现在我执行K均值算法
19
00:00:39,080 --> 00:00:41,560
algorithm, here is what
方法是这样的
20
00:00:41,910 --> 00:00:44,190
I'm going to do. The first step is to randomly initialize two
首先我随机选择两个点
21
00:00:44,410 --> 00:00:45,920
points, called the cluster centroids.
这两个点叫做聚类中心 (cluster centroids)
22
00:00:46,700 --> 00:00:48,170
So, these two crosses here,
就是图上边的两个叉
23
00:00:49,010 --> 00:00:51,730
these are called the Cluster Centroids
这两个就是聚类中心
24
00:00:53,270 --> 00:00:54,320
and I have two of them
为什么要两个点呢
25
00:00:55,100 --> 00:00:57,840
because I want to group my data into two clusters.
因为我希望聚出两个类
26
00:00:59,130 --> 00:01:02,400
K Means is an iterative algorithm and it does two things.
K均值是一个迭代方法 它要做两件事情
27
00:01:03,480 --> 00:01:04,790
First is a cluster assignment
第一个是簇分配
28
00:01:05,330 --> 00:01:07,800
step, and second is a move centroid step.
第二个是移动聚类中心
29
00:01:08,360 --> 00:01:09,630
So, let me tell you what those things mean.
我来告诉你这两个是干嘛的
30
00:01:11,170 --> 00:01:12,520
The first of the two steps in the
在K均值算法的每次循环中
31
00:01:12,700 --> 00:01:14,930
loop of K means, is this cluster assignment step.
第一步是要进行簇分配
32
00:01:15,840 --> 00:01:17,070
What that means is that, it's
这就是说
33
00:01:17,220 --> 00:01:18,360
going through each of the
我要遍历所有的样本
34
00:01:18,700 --> 00:01:19,880
examples, each of these green
就是图上所有的绿色的点
35
00:01:20,170 --> 00:01:22,120
dots shown here and depending
然后依据
36
00:01:22,580 --> 00:01:24,140
on whether it's closer to the
每一个点
37
00:01:24,350 --> 00:01:25,530
red cluster centroid or the
是更接近红色的这个中心
38
00:01:25,620 --> 00:01:27,390
blue cluster centroid, it is going
还是蓝色的这个中心
39
00:01:27,560 --> 00:01:28,570
to assign each of the
来将每个数据点
40
00:01:28,670 --> 00:01:30,670
data points to one of the two cluster centroids.
分配到两个不同的聚类中心中
41
00:01:32,040 --> 00:01:33,350
Specifically, what I mean
具体来讲
42
00:01:33,460 --> 00:01:34,610
by that, is to go through your
我指的是
43
00:01:34,730 --> 00:01:36,930
data set and color each
对数据集中的所有点
44
00:01:37,130 --> 00:01:38,510
of the points either red or
依据他们
45
00:01:38,810 --> 00:01:39,890
blue, depending on whether
更接近红色这个中心
46
00:01:40,160 --> 00:01:41,060
it is closer to the red
还是蓝色这个中心
47
00:01:41,170 --> 00:01:42,150
cluster centroid or the blue
进行染色
48
00:01:42,470 --> 00:01:45,210
cluster centroid, and I've done that in this diagram here.
染色之后的结果如图所示
49
00:01:46,930 --> 00:01:48,700
So, that was the cluster assignment step.
以上就是簇分配的步骤
50
00:01:49,780 --> 00:01:52,270
The other part of K means, in the
K均值的另一部分
51
00:01:52,410 --> 00:01:53,390
loop of K means, is the move
是要移动聚类中心
52
00:01:53,590 --> 00:01:54,860
centroid step, and what
具体的操作方法
53
00:01:55,020 --> 00:01:55,730
we are going to do is, we
是这样的
54
00:01:55,800 --> 00:01:56,890
are going to take the two cluster centroids,
我们将两个聚类中心
55
00:01:57,390 --> 00:01:58,550
that is, the red cross and
也就是说红色的叉
56
00:01:58,830 --> 00:02:00,270
the blue cross, and we are
和蓝色的叉
57
00:02:00,420 --> 00:02:01,420
going to move them to the average
移动到
58
00:02:02,070 --> 00:02:03,900
of the points colored the same colour.
和它一样颜色的那堆点的均值处
59
00:02:04,880 --> 00:02:05,700
So what we are going
那么我们要做的是
60
00:02:05,730 --> 00:02:06,510
to do is look at all the
找出所有红色的点
61
00:02:06,630 --> 00:02:07,810
red points and compute the
计算出它们的均值
62
00:02:08,240 --> 00:02:09,520
average, really the mean
就是所有红色的点
63
00:02:10,080 --> 00:02:11,500
of the location of all the red points,
平均下来的位置
64
00:02:11,650 --> 00:02:13,690
and we are going to move the red cluster centroid there.
然后我们就把红色点的聚类中心移动到这里
65
00:02:14,190 --> 00:02:15,260
And the same things for the
蓝色的点也是这样
66
00:02:15,460 --> 00:02:16,370
blue cluster centroid, look at all
找出所有蓝色的点
67
00:02:16,560 --> 00:02:17,720
the blue dots and compute their
计算它们的均值
68
00:02:17,840 --> 00:02:19,710
mean, and then move the blue cluster centroid there.
把蓝色的叉放到那里
69
00:02:20,320 --> 00:02:20,880
So, let me do that now.
那我们现在就这么做
70
00:02:21,170 --> 00:02:22,990
We're going to move the cluster centroids as follows
我们将按照图上所示这么移动
71
00:02:24,590 --> 00:02:27,350
and I've now moved them to their new means.
现在两个中心都已经移动到新的均值那里了
72
00:02:28,300 --> 00:02:29,760
The red one moved like that
你看
73
00:02:29,820 --> 00:02:31,350
and the blue one moved
蓝色的这么移动
74
00:02:31,510 --> 00:02:34,460
like that and the red one moved like that.
红色的这么移动
75
00:02:34,620 --> 00:02:35,460
And then we go back to another cluster
然后我们就会进入下一个
76
00:02:35,910 --> 00:02:36,920
assignment step, so we're again
簇分配
77
00:02:37,190 --> 00:02:38,090
going to look at all of
我们重新检查
78
00:02:38,160 --> 00:02:39,670
my unlabeled examples and depending
所有没有标签的样本
79
00:02:40,090 --> 00:02:42,840
on whether it's closer the red or the blue cluster centroid,
依据它离红色中心还是蓝色中心更近一些
80
00:02:43,340 --> 00:02:45,150
I'm going to color them either red or blue.
将它染成红色或是蓝色
81
00:02:45,640 --> 00:02:47,160
I'm going to assign each point
我要将每个点
82
00:02:47,530 --> 00:02:48,550
to one of the two cluster centroids, so let me do that now.
分配给两个中心的某一个 就像这么做
83
00:02:51,450 --> 00:02:52,260
And so the colors of some of the points just changed.
你看某些点的颜色变了
84
00:02:53,400 --> 00:02:55,690
And then I'm going to do another move centroid step.
然后我们又要移动聚类中心
85
00:02:56,040 --> 00:02:56,810
So I'm going to compute the
于是我计算
86
00:02:57,070 --> 00:02:57,880
average of all the blue points,
蓝色点的均值
87
00:02:58,110 --> 00:02:59,000
compute the average of all
还有红色点的均值
88
00:02:59,040 --> 00:03:00,360
the red points and move my
然后就像图上所表示的
89
00:03:00,480 --> 00:03:03,770
cluster centroids like this, and
移动两个聚类中心
90
00:03:03,930 --> 00:03:05,650
so, let's do that again.
来我们再来一遍
91
00:03:06,160 --> 00:03:07,810
Let me do one more cluster assignment step.
下面我还是要做一次簇分配
92
00:03:08,320 --> 00:03:09,450
So colour each point red
将每个点
93
00:03:09,620 --> 00:03:10,840
or blue, based on what it's
染成红色或是蓝色
94
00:03:11,170 --> 00:03:13,070
closer to and then
依然根据它们离那个中心近
95
00:03:13,310 --> 00:03:20,000
do another move centroid step and we're done.
然后是移动中心 你看就像这样
96
00:03:20,350 --> 00:03:21,230
And in fact if you
实际上
97
00:03:21,290 --> 00:03:23,250
keep running additional iterations of
如果你从这一步开始
98
00:03:23,500 --> 00:03:26,020
K means from here the
一直迭代下去
99
00:03:26,160 --> 00:03:27,240
cluster centroids will not change
聚类中心是不会变的
100
00:03:27,540 --> 00:03:28,770
any further and the colours of
并且
101
00:03:28,830 --> 00:03:29,760
the points will not change any
那些点的颜色也不会变
102
00:03:29,940 --> 00:03:31,520
further. And so, this is
在这时
103
00:03:31,810 --> 00:03:33,520
the, at this point,
我们就能说
104
00:03:33,770 --> 00:03:35,290
K means has converged and it's
K均值方法已经收敛了
105
00:03:35,400 --> 00:03:36,430
done a pretty good job finding
在这些数据中找到两个簇
106
00:03:37,470 --> 00:03:38,750
the two clusters in this data.
K均值表现的很好
107
00:03:39,360 --> 00:03:40,310
Let's write out the K means algorithm more formally.
来我们用更加规范的格式描述K均值算法
108
00:03:42,150 --> 00:03:43,930
The K means algorithm takes two inputs.
K均值算法接受两个输入
109
00:03:44,570 --> 00:03:46,200
One is a parameter K,
第一个是参数K
110
00:03:46,450 --> 00:03:47,260
which is the number of clusters
表示你想从数据中
111
00:03:47,830 --> 00:03:48,900
you want to find in the data.
聚类出的簇的个数
112
00:03:49,640 --> 00:03:50,820
I'll later say how we might
我一会儿会讲到
113
00:03:51,170 --> 00:03:53,290
go about trying to choose k, but
我们可以怎样选择K
114
00:03:53,470 --> 00:03:54,600
for now let's just say that
这里呢 我们只是说
115
00:03:55,110 --> 00:03:56,210
we've decided we want a
我们已经确定了
116
00:03:56,360 --> 00:03:57,600
certain number of clusters and we're
需要几个簇
117
00:03:57,690 --> 00:03:58,810
going to tell the algorithm how many
然后我们要告诉这个算法
118
00:03:59,040 --> 00:04:00,730
clusters we think there are in the data set.
我们觉得在数据集里有多少个簇
119
00:04:01,170 --> 00:04:02,120
And then K means also
K均值同时要
120
00:04:02,490 --> 00:04:03,430
takes as input this sort
接收另外一个输入
121
00:04:03,880 --> 00:04:05,060
of unlabeled training set of
那就是只有 x 的
122
00:04:05,250 --> 00:04:06,530
just the Xs and
没有标签 y 的训练集
123
00:04:06,710 --> 00:04:08,430
because this is unsupervised learning, we
因为这是非监督学习
124
00:04:08,520 --> 00:04:10,690
don't have the labels Y anymore.
我们用不着 y
125
00:04:10,980 --> 00:04:12,470
And for unsupervised learning of
同时在非监督学习的
126
00:04:12,740 --> 00:04:14,020
the K means I'm going to
K均值算法里
127
00:04:14,550 --> 00:04:16,160
use the convention that XI
我们约定
128
00:04:16,420 --> 00:04:17,750
is an RN dimensional vector.
x(i) 是一个n维向量
129
00:04:18,280 --> 00:04:19,190
And that's why my training examples
这就是
130
00:04:19,750 --> 00:04:22,460
are now N dimensional rather N plus one dimensional vectors.
训练样本是 n 维而不是 n+1 维的原因
131
00:04:24,340 --> 00:04:25,430
This is what the K means algorithm does.
这就是K均值算法
132
00:04:27,180 --> 00:04:28,630
The first step is that it
第一步是
133
00:04:28,790 --> 00:04:31,170
randomly initializes k cluster
随机初始化 K 个聚类中心
134
00:04:31,570 --> 00:04:33,550
centroids which we will
记作
135
00:04:33,820 --> 00:04:34,610
call mu 1, mu 2, up
μ1, μ2 一直到 μk
136
00:04:34,840 --> 00:04:36,250
to mu k. And so
就像之前
137
00:04:36,650 --> 00:04:38,450
in the earlier diagram, the
图中所示
138
00:04:38,550 --> 00:04:40,770
cluster centroids corresponded to the
聚类中心对应于
139
00:04:41,060 --> 00:04:42,240
location of the red cross
红色叉和蓝色叉
140
00:04:42,660 --> 00:04:44,020
and the location of the blue cross.
所在的位置
141
00:04:44,410 --> 00:04:45,640
So there we had two cluster
于是我们有两个聚类中心
142
00:04:45,960 --> 00:04:47,000
centroids, so maybe the red
按照这样的记法
143
00:04:47,170 --> 00:04:48,470
cross was mu 1
红叉是 μ1
144
00:04:48,650 --> 00:04:49,940
and the blue cross was mu
蓝叉是 μ2
145
00:04:50,300 --> 00:04:51,360
2, and more generally we would have
通常情况下
146
00:04:51,820 --> 00:04:53,830
k cluster centroids rather than just 2.
我们可能会有比2要多的聚类中心
147
00:04:54,520 --> 00:04:56,240
Then the inner loop
K均值的内部循环
148
00:04:56,520 --> 00:04:57,360
of k means does the following,
是这样的
149
00:04:57,830 --> 00:04:59,020
we're going to repeatedly do the following.
我们会重复做下面的事情
150
00:05:00,070 --> 00:05:01,950
First for each of
首先
151
00:05:02,160 --> 00:05:03,920
my training examples, I'm going
对于每个训练样本
152
00:05:04,110 --> 00:05:05,950
to set this variable CI
我们用变量 c(i) 表示
153
00:05:06,790 --> 00:05:07,960
to be the index 1 through
K个聚类中心中最接近 x(i) 的
154
00:05:08,170 --> 00:05:10,520
K of the cluster centroid closest to XI.
那个中心的下标
155
00:05:11,170 --> 00:05:13,810
So this was my cluster assignment
这就是簇分配
156
00:05:14,330 --> 00:05:16,870
step, where we
这个步骤
157
00:05:17,000 --> 00:05:18,680
took each of my examples and
我先将每个样本
158
00:05:18,980 --> 00:05:20,740
coloured it either red
依据它离那个聚类中心近
159
00:05:21,050 --> 00:05:22,050
or blue, depending on which
将其染成
160
00:05:22,380 --> 00:05:23,940
cluster centroid it was closest to.
红色或是蓝色
161
00:05:24,140 --> 00:05:25,090
So CI is going to be
所以 c(i) 是一个
162
00:05:25,280 --> 00:05:26,280
a number from 1 to
在1到 K 之间的数
163
00:05:26,380 --> 00:05:27,680
K that tells us, you
而且它表明
164
00:05:27,780 --> 00:05:28,760
know, is it closer to the
这个点到底是
165
00:05:28,920 --> 00:05:29,820
red cross or is it
更接近红色叉
166
00:05:29,900 --> 00:05:31,170
closer to the blue cross,
还是蓝色叉
167
00:05:32,200 --> 00:05:33,210
and another way of writing this
另一种表达方式是
168
00:05:33,580 --> 00:05:35,350
is I'm going to,
我想要计算 c(i)
169
00:05:35,620 --> 00:05:37,820
to compute Ci, I'm
那么
170
00:05:37,890 --> 00:05:39,120
going to take my Ith
我要用第i个样本x(i)
171
00:05:39,380 --> 00:05:41,170
example Xi and and
然后
172
00:05:41,360 --> 00:05:42,670
I'm going to measure it's distance
计算出这个样本
173
00:05:43,900 --> 00:05:44,860
to each of my cluster centroids,
距离所有K个聚类中心的距离
174
00:05:45,410 --> 00:05:46,690
this is mu and then
这是 μ
175
00:05:47,060 --> 00:05:48,640
lower-case k, right, so
以及小写的k
176
00:05:48,890 --> 00:05:50,630
capital K is the total
大写的 K 表示
177
00:05:50,910 --> 00:05:51,900
number centroids and I'm going
所有聚类中心的个数
178
00:05:52,100 --> 00:05:53,160
to use lower case k here
小写的 k 则是
179
00:05:53,770 --> 00:05:55,140
to index into the different centroids.
不同的中心的下标
180
00:05:56,240 --> 00:05:58,470
But so, Ci is going to, I'm going
我希望的是
181
00:05:58,550 --> 00:06:00,110
to minimize over my values
在所有K个中心中
182
00:06:00,550 --> 00:06:01,930
of k and find the
找到一个k
183
00:06:02,120 --> 00:06:03,650
value of K that minimizes this
使得xi到μk的距离
184
00:06:03,900 --> 00:06:04,750
distance between Xi and the
是xi到所有的聚类中心的距离中
185
00:06:04,800 --> 00:06:06,130
cluster centroid, and then,
最小的那个
186
00:06:06,340 --> 00:06:08,990
you know, the
也就是说
187
00:06:09,070 --> 00:06:10,350
value of k that minimizes
k的值使这个最小
188
00:06:10,940 --> 00:06:12,160
this, that's what gets set in
这就是计算ci的方法
189
00:06:12,300 --> 00:06:14,100
Ci. So, here's
这里还有
190
00:06:14,360 --> 00:06:16,470
another way of writing out what Ci is.
另外的表示ci的方法
191
00:06:18,050 --> 00:06:19,150
If I write the norm between
我用xi减μk的范数
192
00:06:19,270 --> 00:06:21,500
Xi minus Mu-k,
来表示
193
00:06:23,000 --> 00:06:24,120
then this is the distance between
这是第i个训练样本
194
00:06:24,630 --> 00:06:26,040
my ith training example
到聚类中心μk
195
00:06:26,180 --> 00:06:27,350
Xi and the cluster centroid
的距离
196
00:06:28,140 --> 00:06:30,280
Mu subscript K, this is--this
注意
197
00:06:31,150 --> 00:06:32,830
here, that's a lowercase K. So uppercase
我这里用的是小写的k
198
00:06:33,320 --> 00:06:34,710
K is going to be
大写的K
199
00:06:34,980 --> 00:06:36,210
used to denote the total
大写的k表示
200
00:06:36,450 --> 00:06:38,020
number of cluster centroids,
聚类中心的总数