U.S. flag

An official website of the United States government

Skip Header


Research Computing Environments

Research Computing Environments

Data Science
  • About
  • Data Science
  • Research Projects
  • Working Papers

Data Science

What is Data Science?

Data Science is a field of study that combines mathematics and statistics, programming, advance analytics, artificial intelligence (AI), and machine learning (ML) with specific subject matter expertise to analyze data. Data scientists work across the Census Bureau and play a vital role in the Bureau’s efforts to incorporate new data sources and emerging technologies in its programs. 

Adaptive survey design is data-driven tailoring of surveys to increase the quality of data, increase efficiency, and reduce costs. The data used to tailor a survey can include information the Census Bureau already has about the survey, such as response rates on earlier surveys, or real-time information about the performance of the survey in the field as it is conducted.

Top of Section

Artificial Intelligence refers to computer systems capable of performing complex tasks that historically only a human could do, such as decision making, reasoning, speech recognition and language translation.

Top of Section

Data analytics is the process for analyzing datasets to identify patterns and draw conclusions. Data analytics is used at the Census Bureau for a wide variety of research topics that examine the nation’s people and economy and how society is changing over time. As more data become available, the Census Bureau is adapting to the need to analyze larger and more complex datasets, using new techniques to extract novel insights.

Top of Section

Imputation refers to the procedure of using alternative values in place of missing data. It is referred to as “unit imputation” when replacing a data point and as “item imputation” when replacing a constituent of a data point. Once all missing values have been imputed, the data set can then be analyzed using standard techniques for complete data.

Top of Section

Machine learning refers to a set of computer science techniques that allow computers to discover patterns in the data without being explicitly programmed to detect specific pattern types. The Census Bureau has a rich history of using computational tools to learn about populations and the economy. Machine learning encompasses these methods and includes an additional set of highly efficient and effective modeling techniques that can be used to impute, classify, or predict patterns in the data. 

Top of Section

NER is a subtask of Natural Language Processing that seeks to locate and classify named entities in unstructured text corresponding to pre-defined categories such as persons, organizations, or dates.

Top of Section

NLP refers to the branch of computer science—more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models.

Top of Section

Record linkage or entity resolution is the task of finding records in a data set that refer to the same across different data sources. Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier. Records or units from different data sources are joined together into a single file using non-unique identifiers, such as names, data of birth, addresses and other characteristics. 

Top of Section

Web scraping is the process of creating automated programs to extract content and data from websites. At the Census Bureau, web scraping involves identifying, extracting, and parsing targeted data for analysis that could potentially augment responses to Census surveys, thus enhancing coverage. Data acquisition from the web offers advantages such as controlled data harvesting and transparency.

Top of Section
Page Last Revised - February 6, 2025
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header