Guia de Cursos

Queres conhecer a oferta de cursos da NOVA, nas áreas das licenciaturas, mestrados e doutoramentos?
No nosso Guia de Cursos encontras informação útil sobre Faculdades, Institutos e Escolas.
Podes ainda aceder a informações complementares necessárias a uma completa integração.

saber mais Guia de Cursos

Instituto Superior de Estatística e Gestão de Informação

Data Mining

Código

82033

Unidade Orgânica

Instituto Superior de Estatística e Gestão de Informação

Créditos

6.0

Professor responsável

Fernando José Ferreira Lucas Bação

Horas semanais

45.0

Língua de ensino

Português. No caso de existirem alunos de Erasmus, as aulas serão leccionadas em Inglês

Objectivos

The goal of the course is introducing students to methods for solving complex problems,
involving the use and the uderstanding of large sets of data,
often globally known as Data Mining.
Among the methods that will be studied, particular relevance will be given to Evolutionary
Algorithms (Holland, 1975; Goldberg, 1989) and Neural Networks (Haykin, 1999).
The two main features these methods share are the ability of returning approximate solutions
(in case they are not able to find a perfect one) to a problem and the ability of dealing
with concepts such as imperfection, imprecision, and incomplete information.
Other computational methods are also discussed in this course, like various forms of
Local Search (like Hill Climbing (Aarts and Korst, 1998), Simulated Annealing (Kirkpatrick et al., 1983),
Tabu Search (Glover and Laguna, 1997)) and Particle Swarm Optimization (Clerc, 2006),
a new method for the optimization of continuous values inspired by the collective
behavior of swarms (and belonging to the field of Swarm Intelligence).
All the methods presented in this course are also traditionally classified as belonging to
the wider field of Machine Learning (Mitchell, 1996). Thus an important part of this course
is dedicated to introducing the fundations of Machine Learning.

Pré-requisitos

The course will be held in English language.
No particular propaedeutic knowledge is requested for attending this course,
except a general knowledge of the fundaments of Computer Science and the basics
of Mathematics, in particular Statistics.

Conteúdo

1 Introduction and Motivation

  • Complex problems and NP-hard problems.
  • Large datasets and complex problems
  • What is data mining?
  • Definition and foundations of Machine Learning.
  • Outline of typical Data Mining applications.
2 Search Spaces and Optimization
  • Optimization Problems and their complexity. Are they hard? Why?
  • No Free Lunch Theorem.
  • Introduction to Local Search.
  • Hill Climbing.
  • Fitness Landscapes: their meaning and importance; their structure and topology.
  • Simulated Annealing
  • general definitions and functioning.
  • theory and asymptotic behavior.
  • Tabu Search
  • general definitions and functioning.
  • short term and long term memory.
3 Genetic Algorithms
  • Outline of the basic concepts of Darwinian Evolution Theory.
  • Genetic operators.
  • General functioning of a simple Genetic Algorithm (GA).
  • Genotype and phenotype.
  • Selection mechanisms.
  • Asymptotic behavior of GAs.
  • Multiobjective optimization
  • VEGA (Shaffer)
  • The concepts of Pareto optimal set and Pareto front.
  • Experimental evaluation of the performance of a GA.
  • A general discussion of the possible applications.
4 Swarm Intelligence
  • Introduction and motivation.
  • Continuous optimization and real-valued parameters optimization.
  • Particle Swarm Optimization (PSO)
  • Basic (or "canonical") PSO version.
  • Possible extensions and their advantages and drawbacks.
5 Genetic Programming
  • Representation and main differences with GAs.
  • Genetic operators.
  • Fitness calculation methods.
  • Properties of Closure and Sufficiency.
  • The concept of Steady State.
  • Automatically Defined Functions (ADF).
  • Genetic Programming (GP) Benchmarks (even parity, multiplexer,symbolic
  • egression, artificial ant on the Santa Fe trail).
  • Open Issues in GP.
  • A general discussion of the possible applications.
6 Deepening of some general concepts of Machine Learning and Data Mining
  • The concept of learning.
  • Learning a function.
  • The concept of generalization.
  • Supervised and unsupervised learning.
  • The case of classification and clustering.
  • Performance of a classifier.
  • Data splitting.
  • Training, test and validation sets. Definitions and motivations.
  • Crossvalidation and its variants.
  • Precision and Recall. True positives, true negatives, false positives, false
  • egatives. F-measure and K-statistic.
  • The concept of feature and its importance.
  • The importance of feature selection.
  • Outline of the main feature selection methods.
  • Principal component based feature selection.
  • Correlation based feature selection.
7 Neural Networks
  • Introduction and Motivations.
  • A very light outline of the physical structure of human brain. Neurons.
  • Artificial neuron.
  • Perceptron
  • Single neuron model.
  • Perceptron learning rule.
  • Convergence theorem of perceptron.
  • The main activation functions.
  • Networks of neurons.
  • Adaline
  • General structure and functioning.
  • Delta learning rule. Gradient descent.
  • Linearly separable and non-linearly separable problems.
  • Hidden layer of neurons.
  • Universal Approximation Theorem.
  • Backpropagation. Formal justification and implementation.
  • Cyclic (or Recursive) Neural Networks.
  • Jordan networks.
  • Elman networks
  • Hopfield networks. The concept of associative memory. Hebb learning rule.
  • Non-Supervised Neural Networks
  • Competitive learning networks.
  • Kohonen networks (Self-Organizing Maps).

Bibliografia

Tettamanzi, M. Tomassini, "Soft Computing", Sprinter
M. Mitchell, "Introduzione agli algoritmi genetici", Apogeo
W. Banzhaf, P. Nordin, R. Keller and F. Francone, "Genetic Programming, an introduction", Morgan Kaufmann
T. Mitchell, Machine Learning, McGraw Hill

Método de ensino

Theoretical classes will be held using the blackboard
and projecting slides. Practical classes will be held
in computer rooms and laboratories, allowing the students
to apply the concepts that have been explained previously.

Método de avaliação

The examination consists in a written test concerning
all the contents of the course.

Cursos