Courses organized by Dipartimento di Matematica e Informatica

you must be logged-in to have full access to course material

Web Information Retrieval - PhD course

Course outline: 

The course will be an overall survey on Web Information Retrieval. The course is divided into two parts:

Part I: Information Retrieval (IR)
- Introduction to IR: what IR is, models, systems, evaluation, problems and difficulties.
- IR models: boolean, vector space, probabilistic, latent semantic indexing, neural nets.
- Inverted index: structure, construction, use.
- Query languages and reformulation techniques.
- Human Computer Interaction and IR: examples of user interfaces for IR.
- Clustering: main algorithms, usage in IR.
- Evaluation: relevance, test-collections, user studies, metrics.

Part II: Web IR
- Statistics on Web and Web users.
- Models of the Web Graph: random graphs, scale free and small world network, power laws, bow-tie, Web host graph.
- Link analysis for ranking (PageRank, HITS, etc.)
- Spam, duplications, mirrors.
- Search engine architecture.
- Crawling.

Since it's unlikely that we can tackle in detail all the above topics, I plan to select a subset of them (and maybe some additional topics) in agreement with course participants.

More details on the Google group - contact the lecturer to join.

Exam: TBD (probably a seminar after some individual study/work by the student)

Constraint Programming and NMR Constraints for Determining Protein Structure

Course outline: 

Lecture 1. Dovier
Introduction to the notions of constraint satisfaction problem (CSP) and constraint optimization problem (COP). Solutions' space and search trees. Some basic complexity notions. Two approaches: integer linear programming and local search.

Lecture 2. Fogolari
Protein modeling Introduction to protein structure Structural classification of protein structure Protein structure modeling Homology modeling Fold recognition Ab-initio modeling

Lecture 3. Dovier
Constraint programming. Constraint propagation. Node, Arc, and Bounds consistency and their extensions. Prop-labeling trees and Branch-and-bound. Global constraints: complete and approximate propagation. Encoding of some COP and CSP in Constraint Logic Programming.

Lecture 4. Esposito
Conformational constraints from NMR experiments Basic features of NMR spectra: resonance, coupling. Chemical shift - correllation with secondary and tertiary structure. Scalar coupling constant and correlation with local conformation. Dipolar coupling. Dipolar coupling in isotropic solution: NOE. Correlation of NOE with molecular mobility and internuclear distances. Dipolar coupling in anisotropic solution: DC and RDC. RDC and orientational restraints.

Lecture 5. Dovier
The protein structure prediction problem as Constraint Optimization Problem in CLP. The HP model in N^2 and the 20x20 model in FCC. Global constraints for discrete lattices.

Lecture 6. Corazza
Methods for structural determination from NMR restraints. Introduction to the methods used: distance geometry, molecular dynamics and simulated annealing MD in cartesian coordinate space and in torsional angle space. CYANA: an example of a software for molecular dynamics and simulated annealing simulation in torsional angle space. Outline on combined ab-initio/NMR restrained methods (Rosetta NMR) Validation criteria for NMR structures.

Systems Biology


Data Mining and Mathematical Programming

Course outline: 

The course has addressed some problems of Logical Analysis of Data (LAD) from the point of view of Mathematical Programming. Given a set T (positive observations) of m_1 binary vectors in R^n (n is the number of features entering the observation) and a set F (negative observations) of m_0 binary vectors in R^n, we want to build some tools such that a new observation can be classified as either positive or negative. One way to classify points is through patterns, i.e. vectors of the type (1,-,0,0,-,1), where the symbol - can be replaced by either 0 or 1. A pattern is a subcube of the unit dimensional cube. The support of a pattern is the set of features for which the pattern has value 0 or 1. A pattern covers an observation if they are equal on the support of the pattern. A positive pattern is a pattern covering at least one positive observation and no negative observation. A negative pattern is a pattern covering at least one negative observation and no positive observation. Several different requirements can be formulated for the patterns. We may prefer patterns over other patterns according to alternative criteria like simplicity, selectivity, coverage, minimal support, etc.. The different criteria have been investigated and integer linear programming models have been formulated to find solution meeting particular criteria. When the data are numbers and not necessarily binary digits, two alternative ways can be explored. We may either ``binarize'' the data and so the previous techniques apply, or we may solve directly the new problem. Both models have been investigated. Finally the separation problem of two sets of points in R^n has been considered. First the properties of separating linearly separable sets have been investigated (considering how different norms affect the solution) and it has been shown how to extend the separation to non linearly separable sets of points.