Notes on Knowledge Discovery from Databases
I.Savnik
February, 1999
Some links related to KDD
Introduction
- overall process of data mining
- critical steps in the data mining process
- data preprocessing
- data classification
- data clustering
- visualisation
- time serial data analysis
- formalising data mining problems
- prediction and forecasting
Literature:
- Brachman, R.J., Anand, T., "The process of knowledge discovery
in databases", Advances in Knowledge Discovery and Data Mining,
Fayyad, U.M., Piatetsky-Shapiro, G., Symith, P., Uthurusamy, R.,
Editors, AAAI Press/The MIT Press, 1996.
- The Online School on Inductive Logic Programming and Knowledge
Discovery in Databases, Jozef Stefan Institute,
http://www-ai.ijs.si/~ilpnet/ilpkdd/
Logical conjunctions
(Introduction to algorithms used in KDD)
- Knowledge representation
- General-to-specific ordering
- Non-incremental algorithms
- exhaustive general-to-specific algorithm
- heuristic general-to-specific algorithm
- examples:
AQ11 (Michalski et al. 1980),
Beam-search (Clark and Niblett 1989)
- Incremental algorithms
- incremental general-to-specific algorithm (covering algorithms)
- examples: Candidate-Elimination (Mitchell 1997)
- incremental hill-climbing algorithm
Literature:
- Langley, P., "Elements of machine learning", Morgan Kaufmann
Publishers, 1996.
- Mitchell, T.M., "Machine learning", McGraw Hill Series in
Computer Science, 1997.
Decision trees
- Representational language
- concept hierarchies, decision trees
- examples of concept hierarchies
- Non-incremental algorithms
- top-down (divisive) induction of decision trees
- ID3 algorithm, C4.5 extensions (Quinlan 1986)
- Incremental algorithms
- incremental induction of decision trees
- Pruning decision trees
- Transforming decision trees to rules
Tools:
Literature:
- Quinlan,J.R, "Induction of decision trees", Machine Learning,
1(1), 1986.
- Quinlan, J.R., "C4.5: Programs for Machine Learning"
Morgan Kauffman, 1993.
- Mitchell, T.M., "Machine learning", McGraw Hill Series in
Computer Science, 1997.
- Langley, P., "Elements of machine learning", Morgan Kaufmann
Publishers, 1996.
Decision lists
- Knowledge representation
- sets of rules
- decision lists
- ordered list of rules
- mutually exclusive rules (*)
- Sequential covering algorithms
- rules expressed in propositional logic
- non-incremental algorithms
- AQ algorithm (Michalski 1980)
- CN2 algorithm (Beam search)
(Clark and Niblett 1989)
- Inductive logic programming
- rules expressed in first-order logic, Horn clauses
- FOIL algorithm (Quinlan 1990)
- inverted resolution, CIGOL (Muggleton and Buntine 1988)
Tools:
- CN2 (Boswell,Clark)
- FOIL (Quinlan)
Literature:
- Mitchell, T.M., "Machine learning", McGraw Hill Series in
Computer Science, 1997.
- Clark,P., Niblett,T., "The CN2 Induction Algorithm",
Machine Learning, Vol.3, No.4, 1989, pp.261-283.
- Quinlan,J.R., "Learning logical definition from relations",
Machine Learning, 5, 1990.
- Muggleton,S., Buntine,W., "Machine invention of first-order
predicates by inverting resolution", Proc. of Machine Learning
Conf., 1988.
- Langley, P., "Elements of machine learning", Morgan Kaufmann
Publishers, 1996.
Association rules
- Representational language
- association rules
- definition of association rules
- examples of association rules
- Algorithms for the discovery of association rules
- AIS algorithm (Agrawal et al. 1993)
- Apriori and AprioriTid algorithms
(Agrawal et al. 1996)
- non-incremental (top-down) algorithms
- algorithms similar to levelwise algorithm
Literature:
- Agrawal,R., Imielinski,T., Swami,A., "Mining association rules
between sets of items in large databases", ACM SIGMOD, 1993.
- Agrawal,R., Mannila,H., Srikant,R., Tiovonen,H., Verkamo,A.I.,
"Fast discovery of association rules", Advances in knowledge
discovery and data mining, Fayyad, U.M., Piatetsky-Shapiro,
G., Symith, P., Uthurusamy, R.,
Editors, AAAI Press/The MIT Press, 1996.
- Srikant,R., "Fast algorithms for mining association rules and
sequential patterns", Ph.D. Thesis, University of Wisconsin,
1996.
Time serial data analysis
- Knowledge representation
- episodes (sequential patterns in time-series data)
- frequency of episode
- ordering of episodes, sub-episodes
- Algorithms for the discovery of frequent episodes
- WINEPI algorithm
- episode lattice
- levelwise algorithm (non-incremental)
Literature:
- Mannila.H., Toivonen,H., Verkamo,A.I., "Discovery of Frequent
Episodes in Event Sequences", Data Mining and Knowledge
Discovery Journal, Vol.1, Issue 3, 1997.
Theoretical framework
- Ordering of the sentences
- general-to-specific poset
- Theories and borders
- Model theory
- definition of the discovery problem
- Discovery of interesting sentences
- Levelwise algorithm (Mannila and Tiovonen 1997)
- Candidate-Elimination algorithm (Mitchell 1997)
- Examples of theories
Literature:
- Mannila,H., Tiovonen,H., "Levelwise search and borders of
theories in knowledge discovery", Data Mining and Knowledge
Discovery Journal, Volume 1, Issue 3, 1997, pp. 241-258.
- Mitchell, T.M., "Machine learning", McGraw Hill Series in
Computer Science, 1997.
Last update: Fri Feb 12 08:43:43 MET 1999