Skip to content

Commit

Permalink
Merge pull request #34 from DHRI-Curriculum/dyoong-suggested-terms
Browse files Browse the repository at this point in the history
Adding terms for data literacies
  • Loading branch information
kallewesterling authored Aug 4, 2020
2 parents 616ecb1 + 98b3f3c commit 5b3386a
Show file tree
Hide file tree
Showing 12 changed files with 137 additions and 0 deletions.
11 changes: 11 additions & 0 deletions terms/csv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# CSV (file format)

CSV or Comma Separated Values uses---you guessed it!---commas to separate values. Each line (First Name, Last Name) is a new "record" and each column (separated by a comma) is a new "field." This data format stores tabular data in a clean way that facilitates the transfer between different data architectures. As data types go, it is very rudimentary (even predating computers!) and is easy to type, without needing special characters beyond a comma.

```
First Name,Last Name
Smally,McTiny
Kitty,Kitty
Foots,Smith
Tiger,Jaws
```
9 changes: 9 additions & 0 deletions terms/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Data

There are many different perspectives towards what counts as data. Some cites data as "material or information" for which "an argument, theory, test or hypothesis, or another research output is based" upon ([Queensland University of Technology](http://www.mopp.qut.edu.au/D/D_02_08.jsp)), while others critiques the understanding of data as "mere descriptions ofa priori conditions" ([Johanna Drucker](http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html)). Data, in our case, are subjective (because of our interests and assumptions) and are materials and/or information necessary to come to our conclusion.

## Readings

- Johanna Drucker's [Humanities Approaches to Graphical Display](http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html)
- Matthew Salganik's [Readymade v. Custommade Data](https://www.bitbybitbook.com/en/1st-ed/introduction/themes/)
- Catherine D'Ignazio and Lauren Klein's [The Numbers Don't Speak for Themselves](https://data-feminism.mitpress.mit.edu/)
9 changes: 9 additions & 0 deletions terms/desc-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Descriptive Analysis

Descriptive analysis are techniques geared towards summarizing a data set, such as:

- Mean
- Median
- Mode
- Average
- Standard deviation
5 changes: 5 additions & 0 deletions terms/high-quality-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# High Quality Data

High quality data is often understood as valid, accurate, complete, consistent, and uniformed. This is often achieved through the cleaning process.

Measurements are valid when they conform to set constraints. They are accurate when they represent the correct values (often requiring cross-referencing trusted external sources). They are complete when they represent everything that might be known and are consistent when observations do not contradict each other. Measurements are uniform when the same unit of measure is used in all relevant measurements.
6 changes: 6 additions & 0 deletions terms/inferential-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Inferential Analysis

Inferential analysis are techniques geared towards testing a hypothesis about a population, based on your data set, such as:

- Extrapolation
- P-Value calculation
26 changes: 26 additions & 0 deletions terms/json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# JSON (file format)

JSON or JavaScript Object Notation, also uses a nesting structure, but with the addition of "key/value" pairs, like the firstName key which is tied to the `Smally` value (at least for the first cat!). JSON is popular with web applications that save and send data from your browser to web servers, because it uses the main language of web browsers, JavaScript, to work with data.

```json
{
"Cats": [
{
"firstName": "Smally",
"lastName": "McTiny"
},
{
"firstName": "Kitty",
"lastName": "Kitty"
},
{
"firstName": "Foots",
"lastName":"Smith"
},
{
"firstName": "Tiger",
"lastName":"Jaws"
}
]
}
```
8 changes: 8 additions & 0 deletions terms/open-data-formats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Open Data Formats

Open data formats are file formats that are available to anyone, free of charge, which allows for accessibility, future-proofing, and preservation. These file formats also allow for easy reusability and aids research reproduction and accountability. They are not limited by intellectual property rights or copyrights. This is distinct from proprietary formats. Some examples of open data formats are .csv, .pdf, and .json.

## Readings

- Library of Congress [Recommended Formats Statement](https://www.loc.gov/preservation/resources/rfs/)
- Stanford University's [best practices for file formats](https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-formats)
8 changes: 8 additions & 0 deletions terms/proprietary-data-fornats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Proprietary Data Formats

Proprietary data file formats are file formats that rely on dedicated, licensed softwares and/or systems. These file formats are often copyrighted, patented, or have other restrictions placed on them, and often require a fee or a paid-for software to open. These file formats are usually discouraged in research projects, especially those with intentions to share with a wider public(s) and audience. This is distinct from open data formats. Some examples of it include .xslx, .doc, and .3ds.

## Readings

- Library of Congress [Recommended Formats Statement](https://www.loc.gov/preservation/resources/rfs/)
- Stanford University's [best practices for file formats](https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-formats)
11 changes: 11 additions & 0 deletions terms/qual-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Qualitative Analysis

Qualitative analysis are techniques geared towards understanding a phenomenon, rather than predicting and testing hypotheses, such as:

- Grounded Theory/Computational Grounded Theory
- Content Analysis
- Text Analysis

## Readings

- [Computational Grounded Theory: A Methodological Framework](https://drive.google.com/file/d/0BxI6W5IIG74FeEtGbjQ0WF9uM0U/view)
9 changes: 9 additions & 0 deletions terms/raw-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# "Raw" Data

"Raw" data is yet to be processed, meaning it has yet to be manipulated by a human or computer. Received or collected data could be in any number of formats, locations, etc.. It could be in any of the forms listed in the previous section.

But "raw" data is a relative term, inasmuch as when one person finishes processing data and presents it as a finished product, another person may take that product and work on it further, and for them that data is "raw" data.

## Readings

- Johanna Drucker's [Humanities Approaches to Graphical Display](http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html)
11 changes: 11 additions & 0 deletions terms/tidy-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Tidy Data

Tidy data are a way of processing and organizing data in to a data structure that follows these rules:

1. Each variable is in a column.
2. Each observation is a row.
3. Each value is a cell.

## Readings

- [Tidy Data](https://www.jstatsoft.org/article/view/v059i10)
24 changes: 24 additions & 0 deletions terms/xml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# XML (file format)

XML or eXstensible Markup Language is a file format that uses a nested structure where the "tags" like `<Cat>` contain other tags inside them, like `<firstName>`. This format is good for organizing the layout of a document in a tree-like format, just like HTML, where we want to nest elements like a sentence within a paragraph, for example. XML does not carry any information about how to be displayed and can be used in a variety of presentation scenarios.

```xml
<Cats>
<Cat>
<firstName>Smally</firstName>
<lastName>McTiny</lastName>
</Cat>
<Cat>
<firstName>Kitty</firstName>
<lastName>Kitty</lastName>
</Cat>
<Cat>
<firstName>Foots</firstName>
<lastName>Smith</lastName>
</Cat>
<Cat>
<firstName>Tiger</firstName>
<lastName>Jaws</lastName>
</Cat>
</Cats>
```

0 comments on commit 5b3386a

Please sign in to comment.