From Wikipedia, the free encyclopedia.
Clustering has two different meanings in computer science:
- In computer hardware, clustering is the connection of many low-cost computers using special software such that they can be used as one larger computer. Clustering can either be used to provide reliability (when one machines fails, the others takes over its workload) or as a means to inexpensively provide large amounts of computing power.
- In machine learning, clustering is an algorithm that takes a data set of inputs and divides them into equivalence classes, so that every input in a class is "similar" in some way.
Clustering in biology has two main applications in the fields of computational biology and bioinformatics.
- In proteomics, clustering is used to build groups of proteins with related expression patterns. Often such groups contain functionally related proteins, and thus high throughput experiments using expressed sequence tags (ESTs) can be a powerfull tool for genome annotation, a general aspect of genomics.
- In sequence analysis, clustering is used to group homologous sequences into gene families. This is a very important concept in bioinformatics, and evolutionary biology in general. See evolution by gene duplication.

