data cleaning

[ˈdeɪtə ˈklinɪŋ]
noun
limpeza de dados
1. The process of detecting and correcting (or removing) corrupt or inaccurate records from a dataset, database, or table
Data cleaning is essential before running any analytics to ensure the accuracy of results.
A limpeza de dados é essencial antes de realizar qualquer análise para garantir a precisão dos resultados.
2. The removal of incomplete, duplicate, or incorrectly formatted data from a dataset
We spent two weeks on data cleaning to remove all duplicates and fill missing values.
Passamos duas semanas na limpeza de dados para remover todos os duplicados e preencher os valores ausentes.
3. Pre-processing step in data science and machine learning workflows to prepare raw data for analysis
Data cleaning accounted for 80% of the project's total time investment.
A limpeza de dados representou 80% do investimento total de tempo do projeto.
Data cleaning has become increasingly important in Brazilian tech companies and startups, particularly in fintech and e-commerce sectors. The term is widely understood among data scientists and analysts in both Brazil and the USA. In corporate environments, the process is sometimes referred to colloquially as 'data janitor work,' reflecting its unglamorous but essential nature in data science projects.
Synonyms / Sinônimos
data scrubbingdata sanitizationdata preprocessingdata validationdata wrangling
Antonyms / Antônimos
data corruptiondata pollution

Regional Variations

General Brazilian
limpeza de dados
Standard and most commonly used term in Brazilian Portuguese tech communities
São Paulo
limpeza de dados
Predominant usage in tech hubs and corporate environments
Rio de Janeiro
limpeza de dados
Same as general Brazilian; sometimes informally called 'sanitização de dados'
Portugal
limpeza de dados
Same term used, though sometimes 'depuração de dados' is also employed

Related Words

data qualitydata validationdata transformationmissing valuesoutliersduplicate recordsdata profiling

Related Idioms & Phrases

garbage in, garbage out (GIGO) - emphasizing that bad data leads to bad results
clean your data before analysis - common advisory phrase in data science
Look up more words on Fala2Me
The free English-Portuguese dictionary with real Brazilian accents, NYC slang, conjugator and more
Open Fala2Me →