2019
On the Learnability of Software Router Performance via CPU Measurements Proceedings Article
Shelbourne, Charles; Linguaglossa, Leonardo; Lipani, Aldo; Zhang, Tianzhu; Geyer, Fabien
In: Proc. of CoNEXT, pp. 23–25, Association for Computing Machinery, Orlando, FL, USA, 2019, ISBN: 9781450370066.
@inproceedings{10.1145/3360468.3366776,
title = {On the Learnability of Software Router Performance via CPU Measurements},
author = {Charles Shelbourne and Leonardo Linguaglossa and Aldo Lipani and Tianzhu Zhang and Fabien Geyer},
url = {https://www.researchgate.net/publication/337580746_On_the_Learnability_of_Software_Router_Performance_via_CPU_Measurements
https://doi.org/10.1145/3360468.3366776
},
doi = {10.1145/3360468.3366776},
isbn = {9781450370066},
year = {2019},
date = {2019-01-01},
booktitle = {Proc.~of CoNEXT},
pages = {23\textendash25},
publisher = {Association for Computing Machinery},
address = {Orlando, FL, USA},
series = {CoNEXT ’19},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2018
On Biases in Information Retrieval Models and Evaluation PhD Thesis
Lipani, Aldo
TU Wien, 2018.
@phdthesis{PhDLipani2018,
title = {On Biases in Information Retrieval Models and Evaluation},
author = {Aldo Lipani},
url = {http://aldolipani.com/wp-content/uploads/2018/09/phd_thesis.pdf},
doi = {10.13140/RG.2.2.28623.74400},
year = {2018},
date = {2018-09-21},
school = {TU Wien},
abstract = {The advent of the modern information technology has benefited society as the digitisation of content increased over the last half-century. While the processing capability of our species has remained unchanged, the information available to us has been notably increasing. In this overload of information, Information Retrieval (IR) has been playing a prominent role by developing systems capable of separating relevant information from the rest. This separation, however, is a difficult task rooted in the complexity of understanding of what is and what is not relevant. To manage this complexity, IR has developed a strong empirical nature, which has led to the development of grounded retrieval models, resulting in the development of retrieval systems empirically designed to be biased towards relevant information. However, other biases have been observed, which counteract retrieval performance. In this thesis, the reduction of retrieval systems to filters of information, or sampling processes, has allowed us to systematically investigate these biases.
We study biases manifesting in two aspects of IR research: retrieval models and retrieval evaluation. We start by identifying retrieval biases in probabilistic IR models and then develop new document priors to improve retrieval performance. Next, we discuss the accessibility bias of retrieval models, and for Boolean retrieval models we develop a mathematical framework of retrievability. For retrieval evaluation biases, we study how test collections are built using the pooling method and how this method introduces bias. Then, to improve the reliability of the evaluation, we first develop new pooling strategies to mitigate this bias at test collection build time and then, for two IR evaluation measures, Precision and Recall at cut-off (P@n and R@n), we develop new pool bias estimators to mitigate it at evaluation time.
Through a large scale experimentation involving up to 15 test collections, four IR evaluation measures and three bias measures, we demonstrate that including document priors based on verboseness improves the performance of probabilistic retrieval models; that the accessibility bias of Boolean retrieval models quickly worsens for conjunctive queries with the increase of the query length (while slightly improving for disjunctive queries); that the test collection bias can be lowered at test collection build time by pooling strategies inspired by a well-known problem in reinforcement learning, the multi-armed bandit problem; and that this bias can also be improved at evaluation time by analysing the runs participating in the pool. For this last point in particular, we show that for P@n, bias reduction is done by quantifying the potential of the new system against the pooled runs, and for R@n, this is done instead by simulating the absence of a pooled run from the set of pooled runs.
This thesis contributes to the IR field by giving a better understanding of relevance through the lens of biases in retrieval models and retrieval evaluation. The identification of these biases, and their exploitation or mitigation, leads to the development of better performing IR models and the improvement of the current IR evaluation practice.},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}
We study biases manifesting in two aspects of IR research: retrieval models and retrieval evaluation. We start by identifying retrieval biases in probabilistic IR models and then develop new document priors to improve retrieval performance. Next, we discuss the accessibility bias of retrieval models, and for Boolean retrieval models we develop a mathematical framework of retrievability. For retrieval evaluation biases, we study how test collections are built using the pooling method and how this method introduces bias. Then, to improve the reliability of the evaluation, we first develop new pooling strategies to mitigate this bias at test collection build time and then, for two IR evaluation measures, Precision and Recall at cut-off (P@n and R@n), we develop new pool bias estimators to mitigate it at evaluation time.
Through a large scale experimentation involving up to 15 test collections, four IR evaluation measures and three bias measures, we demonstrate that including document priors based on verboseness improves the performance of probabilistic retrieval models; that the accessibility bias of Boolean retrieval models quickly worsens for conjunctive queries with the increase of the query length (while slightly improving for disjunctive queries); that the test collection bias can be lowered at test collection build time by pooling strategies inspired by a well-known problem in reinforcement learning, the multi-armed bandit problem; and that this bias can also be improved at evaluation time by analysing the runs participating in the pool. For this last point in particular, we show that for P@n, bias reduction is done by quantifying the potential of the new system against the pooled runs, and for R@n, this is done instead by simulating the absence of a pooled run from the set of pooled runs.
This thesis contributes to the IR field by giving a better understanding of relevance through the lens of biases in retrieval models and retrieval evaluation. The identification of these biases, and their exploitation or mitigation, leads to the development of better performing IR models and the improvement of the current IR evaluation practice.
A Systematic Approach to Normalization in Probabilistic Models Journal Article
Lipani, Aldo; Roelleke, Thomas; Lupu, Mihai; Hanbury, Allan
In: Information Retrieval Journal, 2018.
@article{Lipani2018,
title = {A Systematic Approach to Normalization in Probabilistic Models},
author = {Aldo Lipani and Thomas Roelleke and Mihai Lupu and Allan Hanbury},
doi = {10.1007/s10791-018-9334-1},
year = {2018},
date = {2018-06-30},
journal = {Information Retrieval Journal},
abstract = {Every information retrieval (IR) model embeds in its scoring function a form of term frequency (TF) quantification. The contribution of the term frequency is determined by the properties of the function of the chosen TF quantification, and by its TF normalization. The first defines how independent the occurrences of multiple terms are, while the second acts on mitigating the a priori probability of having a high term frequency in a document (estimation usually based on the document length). New test collections, coming from different domains (e.g. medical, legal), give evidence that not only document length, but in addition, verboseness of documents should be explicitly considered. Therefore we propose and investigate a systematic combination of document verboseness and length. To theoretically justify the combination, we show the duality between document verboseness and length. In addition, we investigate the duality between verboseness and other components of IR models. We test these new TF normalizations on four suitable test collections. We do this on a well defined spectrum of TF quantifications. Finally, based on the theoretical and experimental observations, we show how the two components of this new normalization, document verboseness and length, interact with each other. Our experiments demonstrate that the new models never underperform existing models, while sometimes introducing statistically significantly better results, at no additional computational cost.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2017
Fixed-Cost Pooling Strategies Based on IR Evaluation Measures Proceedings Article
Lipani, Aldo; Palotti, Joao; Lupu, Mihai; Piroi, Florina; Zuccon, Guido; Hanbury, Allan
In: Proc. of ECIR, 2017.
@inproceedings{Lipani2017,
title = {Fixed-Cost Pooling Strategies Based on IR Evaluation Measures},
author = {Aldo Lipani and Joao Palotti and Mihai Lupu and Florina Piroi and Guido Zuccon and Allan Hanbury},
doi = {10.1007/978-3-319-56608-5_28},
year = {2017},
date = {2017-01-01},
booktitle = {Proc.~of ECIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Visual Pool: A Tool to Visualize and Interact with the Pooling Method Proceedings Article
Lipani, Aldo; Lupu, Mihai; Hanbury, Allan
In: Proc. of SIGIR, 2017.
@inproceedings{Lipani:2017:VPT:3077136.3084146,
title = {Visual Pool: A Tool to Visualize and Interact with the Pooling Method},
author = {Aldo Lipani and Mihai Lupu and Allan Hanbury},
doi = {10.1145/3077136.3084146},
year = {2017},
date = {2017-01-01},
booktitle = {Proc.~of SIGIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Fixed Budget Pooling Strategies Based on Fusion Methods Proceedings Article
Lipani, Aldo; Lupu, Mihai; Palotti, Joao; Zuccon, Guido; Hanbury, Allan
In: Proc. of SAC, 2017.
@inproceedings{Lipani:2017:FBP:3019612.3019692,
title = {Fixed Budget Pooling Strategies Based on Fusion Methods},
author = {Aldo Lipani and Mihai Lupu and Joao Palotti and Guido Zuccon and Allan Hanbury},
doi = {10.1145/3019612.3019692},
year = {2017},
date = {2017-01-01},
booktitle = {Proc.~of SAC},
abstract = {The empirical nature of Information Retrieval (IR) mandates strong experimental practices. The Cranfield/TREC evaluation paradigm represents a keystone of such experimental practices. Within this paradigm, the generation of relevance judgments has been the subject of intense scientific investigation. This is because, on one hand, consistent, precise and numerous judgements are key to reduce evaluation uncertainty and test collection bias; on the other hand, however, relevance judgements are costly to collect. The selection of which documents to judge for relevance (known as pooling) has therefore great impact in IR evaluation. In this paper, we contribute a set of 8 novel pooling strategies based on retrieval fusion methods. We show that the choice of the pooling strategy has significant effects on the cost needed to obtain an unbiased test collection; we also identify the best performing pooling strategy according to three evaluation measure.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Schillaci, Calogero; Acutis, Marco; Lombardo, Luigi; Lipani, Aldo; Fantappiè, Maria; Märker, Michael; Saia, Sergio
In: Science of The Total Environment, vol. 601-602, 2017.
@article{SCHILLACI2017821,
title = {Spatio-temporal topsoil organic carbon mapping of a semi-arid Mediterranean region: The role of land use, soil texture, topographic indices and the influence of remote sensing data to modelling},
author = {Calogero Schillaci and Marco Acutis and Luigi Lombardo and Aldo Lipani and Maria Fantappi\`{e} and Michael M\"{a}rker and Sergio Saia},
doi = {10.1016/j.scitotenv.2017.05.239},
year = {2017},
date = {2017-01-01},
journal = {Science of The Total Environment},
volume = {601-602},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2016
Fairness in Information Retrieval Proceedings Article
Lipani, Aldo
In: Proc. of SIGIR, 2016.
@inproceedings{Lipani:2016:FIR:2911451.2911473,
title = {Fairness in Information Retrieval},
author = {Aldo Lipani},
doi = {10.1145/2911451.2911473},
year = {2016},
date = {2016-01-01},
booktitle = {Proc.~of SIGIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The Solitude of Relevant Documents in the Pool Proceedings Article
Lipani, Aldo; Lupu, Mihai; Kanoulas, Evangelos; Hanbury, Allan
In: Proc. of CIKM, 2016.
@inproceedings{Lipani:2016:SRD:2983323.2983891,
title = {The Solitude of Relevant Documents in the Pool},
author = {Aldo Lipani and Mihai Lupu and Evangelos Kanoulas and Allan Hanbury},
doi = {10.1145/2983323.2983891},
year = {2016},
date = {2016-01-01},
booktitle = {Proc.~of CIKM},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The Curious Incidence of Bias Corrections in the Pool Proceedings Article
Lipani, Aldo; Lupu, Mihai; Hanbury, Allan
In: Proc. of ECIR, 2016.
@inproceedings{Lipani2016,
title = {The Curious Incidence of Bias Corrections in the Pool},
author = {Aldo Lipani and Mihai Lupu and Allan Hanbury},
doi = {10.1007/978-3-319-30671-1_20},
year = {2016},
date = {2016-01-01},
booktitle = {Proc.~of ECIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The Impact of Fixed-Cost Pooling Strategies on Test Collection Bias Proceedings Article
Lipani, Aldo; Zuccon, Guido; Lupu, Mihai; Koopman, Bevan; Hanbury, Allan
In: Proc. of ICTIR, 2016.
@inproceedings{Lipani:2016:IFP:2970398.2970429,
title = {The Impact of Fixed-Cost Pooling Strategies on Test Collection Bias},
author = {Aldo Lipani and Guido Zuccon and Mihai Lupu and Bevan Koopman and Allan Hanbury},
doi = {10.1145/2970398.2970429},
year = {2016},
date = {2016-01-01},
booktitle = {Proc.~of ICTIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2015
DASyR(IR) - Document Analysis System for Systematic Reviews (in Information Retrieval) Proceedings Article
Piroi, Florina; Lipani, Aldo; Lupu, Mihai; Hanbury, Allan
In: Proc. of ICDAR, 2015.
@inproceedings{7333830,
title = {DASyR(IR) - Document Analysis System for Systematic Reviews (in Information Retrieval)},
author = {Florina Piroi and Aldo Lipani and Mihai Lupu and Allan Hanbury},
doi = {10.1109/ICDAR.2015.7333830},
year = {2015},
date = {2015-08-01},
booktitle = {Proc.~of ICDAR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Verboseness Fission for BM25 Document Length Normalization Proceedings Article
Lipani, Aldo; Lupu, Mihai; Hanbury, Allan; Aizawa, Akiko
In: Proc. of ICTIR, 2015.
@inproceedings{Lipani:2015:VFB:2808194.2809486,
title = {Verboseness Fission for BM25 Document Length Normalization},
author = {Aldo Lipani and Mihai Lupu and Allan Hanbury and Akiko Aizawa},
doi = {10.1145/2808194.2809486},
year = {2015},
date = {2015-01-01},
booktitle = {Proc.~of ICTIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
An Initial Analytical Exploration of Retrievability Proceedings Article
Lipani, Aldo; Lupu, Mihai; Aizawa, Akiko; Hanbury, Allan
In: Proc. of ICTIR, 2015.
@inproceedings{Lipani:2015:IAE:2808194.2809495,
title = {An Initial Analytical Exploration of Retrievability},
author = {Aldo Lipani and Mihai Lupu and Akiko Aizawa and Allan Hanbury},
doi = {10.1145/2808194.2809495},
year = {2015},
date = {2015-01-01},
booktitle = {Proc.~of ICTIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Splitting Water: Precision and Anti-Precision to Reduce Pool Bias Proceedings Article
Lipani, Aldo; Lupu, Mihai; Hanbury, Allan
In: Proc. of SIGIR, 2015.
@inproceedings{Lipani:2015:SWP:2766462.2767749,
title = {Splitting Water: Precision and Anti-Precision to Reduce Pool Bias},
author = {Aldo Lipani and Mihai Lupu and Allan Hanbury},
doi = {10.1145/2766462.2767749},
year = {2015},
date = {2015-01-01},
booktitle = {Proc.~of SIGIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2014
TUW-IMP at the NTCIR-11 Math-2 Proceedings Article
Lipani, Aldo; Andersson, Linda; Piroi, Florina; Lupu, Mihai; Hanbury, Allan
In: Proc. of NTCIR, 2014.
@inproceedings{Lipani2014TUWIMPAT,
title = {TUW-IMP at the NTCIR-11 Math-2},
author = {Aldo Lipani and Linda Andersson and Florina Piroi and Mihai Lupu and Allan Hanbury},
doi = {10.13140/2.1.1127.8404},
year = {2014},
date = {2014-01-01},
booktitle = {Proc.~of NTCIR},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Extracting Nanopublications from IR Papers Proceedings Article
Lipani, Aldo; Piroi, Florina; Andersson, Linda; Hanbury, Allan
In: Proc. of IRFC, 2014.
@inproceedings{Lipani2014b,
title = {Extracting Nanopublications from IR Papers},
author = {Aldo Lipani and Florina Piroi and Linda Andersson and Allan Hanbury},
doi = {10.1007/978-3-319-12979-2_5},
year = {2014},
date = {2014-01-01},
booktitle = {Proc.~of IRFC},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
An Information Retrieval Ontology for Information Retrieval Nanopublications Proceedings Article
Lipani, Aldo; Piroi, Florina; Andersson, Linda; Hanbury, Allan
In: Proc. of CLEF, 2014.
@inproceedings{Lipani2014c,
title = {An Information Retrieval Ontology for Information Retrieval Nanopublications},
author = {Aldo Lipani and Florina Piroi and Linda Andersson and Allan Hanbury},
doi = {10.1007/978-3-319-11382-1_5},
year = {2014},
date = {2014-01-01},
booktitle = {Proc.~of CLEF},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}