TidyBlocks: Glossary

{:auto_ids} aggregation : A synonym for summarize.

Boolean : A logical value that is either true or false. The name comes from the mathematician George Boole.

column : Every column of a table contains zero or more rows and is referred to by its column name. In statistics a column is a variable that has been observed or measured, but we prefer the term "column" to avoid confusion with variables in programming languages.

column name : Every column in a table must have a distinct name (though columns in different tables can have the same names). A column name must start with the underscore '_' or a letter, and may only contain underscores, letters, and digits. TidyBlocks automatically creates names for some columns, such as _group_ for the column containing group IDs.

datetime : A moment in time represented by years, months, days, hours, minutes, and seconds. Datetimes are always stored in Universal Time Coordinates, and are also referred to as timestamps. Datetimes are written as "YYYY-MM-DD:hh:mm:ss".

expression : Something that performs an operation to produce a value. Expressions can be combined to create new expressions: for example, the expression (temperature - 32) * (5 / 9) uses multiplication to combine two smaller expressions that use subtraction and division.

group : A subset of the rows in a table that have the same values in one or more columns. TidyBlocks gives each group a unique group ID.

group ID : The unique identifier for a group in a table. TidyBlocks automatically stores group IDs in a column called _group_.

k-means clustering : An algorithm that clusters data by repeatedly approximating where the center of each cluster is.

logical : A value that is either true or false.

missing value : A hole in a dataset. Missing values are often called NAs (short for "not available"). Missing is technically not the same thing as Not a Number (NaN), but TidyBlocks treats them the same way.

NA : A missing value. (The abbreviation is short for "not available".)

NaN : Short for "Not a Number", this is a special value used to represent infinity and other strange "numbers". TidyBlocks doesn't store NaN, but instead treats it as a missing value.

operation : Something that can be done to data, such as addition or extracting the month from a datetime.

record : A single set of related observations. Records are stored as rows in tables.

row : Every row of a table spans zero or more columns and stores a single related set of observations. Rows are often called records, and most operations in TidyBlocks work within rows.

silhouette : A measure of how well clustered data is.

string : A synonym for text.

summarize : Combine many values into one. Totalling, calculating the average, and finding the minimum are a few ways to summarize values.

table : A set of rows and columns making up a single dataset. Most blocks in TidyBlocks create a new table from an existing one.

text : Data consisting of letters, digits, punctuation, spaces, and other characters. The text "123" looks the same as the number 123, but is a different kind of value.

timestamp : A unique moment in time. TidyBlocks prefers the term datetime.

type : A kind of data. TidyBlocks has numbers (which can be integers or have fractional parts), text (often called strings by programmers), Booleans (also called logical values, although there aren't any illogical values), datetimes (also called timestamps, and a special marker for missing values.

Universal Time Coordinates : The standard time from which timezones are measured. Often abbreviated "UTC".

value : A single piece of data in a specific row and column of a table. Every value has a type, and some operations only work on some types of values.

variable : In statistics, something that has been observed or measured. Variables correspond to columns in tables; we often use the term "column" to avoid confusion with the idea of a variable in a programming language.