Início do conteúdo

Curso An introduction to data management and data cleaning for scientists

A UFPel está recebendo o professor holandês Mark van der Loo como visitante, dentro das ações do projeto CAPES PrInt. De forma a potencializar o efeito da vinda do professor, a PRPPGI e a CRInter estão convidando a comunidade acadêmica para participar do mini curso “An introduction to data management and data cleaning for scientists”. O intuito é que esta iniciativa seja também um ponta pé inicial para criação de uma disciplina transversal a todos os PPGs da UFPel voltada à análise de dados.

O mini curso será realizado em quatro encontros, no auditório acadêmico do campus Anglo em dias e horários como segue:

05/11 – Terça – 16:00 às 18:00

07/11 – Quinta –  16:00 às 18:00

11/11 – Segunda – 16:00 às 18:00

14/11 – Quinta – 10:00 às 12:00.

O curso será ministrado em inglês. As inscrições devem ser realizadas por meio de preenchimento do formulário eletrônico disponível em com limite de 85 inscrições.

Segue abaixo um breve resumo do curso e do currículo do Prof. Mark van der Loo.

An introduction to data management and data cleaning for scientists

Working with data in a way that is both understandable and reproducible by others is a core competence for research scientists. This four-part lecture series will highlight principals, applications, and free software tools for data management, data processing, and data cleaning in scientific environments.

Topics to be discussed include

– The many faces of information: recognizing data usage and choosing the right structure.
– Data analyses as a value chain, separating tasks in a clear way.
– Managing data in organizations and information modeling.
– Working reproducibly
– An introduction to systematic data cleaning and data validation for statistics.

The lecture series will be a mix of presentations, quizzes, and some practical assignments. In the later lectures we will do some work with R. No prior knowledge of R is assumed, but attendees who wish to join the assignments should bring a laptop with R preinstalled. Instructions for this will be posted with the lecture materials.

All lecture materials will be made available via

Mark van der Loo obtained his PhD in molecular physics in 2008 at Radboud University in the Netherlands. Since 2007 he is a researcher in statistical methodology at Statistics Netherlands: the Dutch government institute of national statistics. His expertise is in statistical computing, with research focusing on data cleaning, data processing, and most recently applications of network science. His published works include peer reviewed papers, software packages, and recently a book entitled ‘Statistical Data Cleaning with Applications in R’ with John Wiley & Sons. Mark’s software packages for R are applied by many users around the world, including statistical offices. As an instructor, he teaches a course in data management for the European Master’s of Official Statistics (EMOS) program for students of the universities of Utrecht and Leiden. Besides that he has more than ten years experience teaching R, data science, data cleaning, machine learning, and text mining both in government and commercial settings.