An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Data Science is a field of study that combines mathematics and statistics, programming, advance analytics, artificial intelligence (AI), and machine learning (ML) with specific subject matter expertise to analyze data. Data scientists work across the Census Bureau and play a vital role in the Bureau’s efforts to incorporate new data sources and emerging technologies in its programs.
Adaptive survey design is data-driven tailoring of surveys to increase the quality of data, increase efficiency, and reduce costs. The data used to tailor a survey can include information the Census Bureau already has about the survey, such as response rates on earlier surveys, or real-time information about the performance of the survey in the field as it is conducted.
Artificial Intelligence refers to computer systems capable of performing complex tasks that historically only a human could do, such as decision making, reasoning, speech recognition and language translation.
Data analytics is the process for analyzing datasets to identify patterns and draw conclusions. Data analytics is used at the Census Bureau for a wide variety of research topics that examine the nation’s people and economy and how society is changing over time. As more data become available, the Census Bureau is adapting to the need to analyze larger and more complex datasets, using new techniques to extract novel insights.
Imputation refers to the procedure of using alternative values in place of missing data. It is referred to as “unit imputation” when replacing a data point and as “item imputation” when replacing a constituent of a data point. Once all missing values have been imputed, the data set can then be analyzed using standard techniques for complete data.
Machine learning refers to a set of computer science techniques that allow computers to discover patterns in the data without being explicitly programmed to detect specific pattern types. The Census Bureau has a rich history of using computational tools to learn about populations and the economy. Machine learning encompasses these methods and includes an additional set of highly efficient and effective modeling techniques that can be used to impute, classify, or predict patterns in the data.
NER is a subtask of Natural Language Processing that seeks to locate and classify named entities in unstructured text corresponding to pre-defined categories such as persons, organizations, or dates.
NLP refers to the branch of computer science—more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models.
Record linkage or entity resolution is the task of finding records in a data set that refer to the same across different data sources. Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier. Records or units from different data sources are joined together into a single file using non-unique identifiers, such as names, data of birth, addresses and other characteristics.
Web scraping is the process of creating automated programs to extract content and data from websites. At the Census Bureau, web scraping involves identifying, extracting, and parsing targeted data for analysis that could potentially augment responses to Census surveys, thus enhancing coverage. Data acquisition from the web offers advantages such as controlled data harvesting and transparency.
Share
Top