-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #34 from DHRI-Curriculum/dyoong-suggested-terms
Adding terms for data literacies
- Loading branch information
Showing
12 changed files
with
137 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# CSV (file format) | ||
|
||
CSV or Comma Separated Values uses---you guessed it!---commas to separate values. Each line (First Name, Last Name) is a new "record" and each column (separated by a comma) is a new "field." This data format stores tabular data in a clean way that facilitates the transfer between different data architectures. As data types go, it is very rudimentary (even predating computers!) and is easy to type, without needing special characters beyond a comma. | ||
|
||
``` | ||
First Name,Last Name | ||
Smally,McTiny | ||
Kitty,Kitty | ||
Foots,Smith | ||
Tiger,Jaws | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Data | ||
|
||
There are many different perspectives towards what counts as data. Some cites data as "material or information" for which "an argument, theory, test or hypothesis, or another research output is based" upon ([Queensland University of Technology](http://www.mopp.qut.edu.au/D/D_02_08.jsp)), while others critiques the understanding of data as "mere descriptions ofa priori conditions" ([Johanna Drucker](http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html)). Data, in our case, are subjective (because of our interests and assumptions) and are materials and/or information necessary to come to our conclusion. | ||
|
||
## Readings | ||
|
||
- Johanna Drucker's [Humanities Approaches to Graphical Display](http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html) | ||
- Matthew Salganik's [Readymade v. Custommade Data](https://www.bitbybitbook.com/en/1st-ed/introduction/themes/) | ||
- Catherine D'Ignazio and Lauren Klein's [The Numbers Don't Speak for Themselves](https://data-feminism.mitpress.mit.edu/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Descriptive Analysis | ||
|
||
Descriptive analysis are techniques geared towards summarizing a data set, such as: | ||
|
||
- Mean | ||
- Median | ||
- Mode | ||
- Average | ||
- Standard deviation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# High Quality Data | ||
|
||
High quality data is often understood as valid, accurate, complete, consistent, and uniformed. This is often achieved through the cleaning process. | ||
|
||
Measurements are valid when they conform to set constraints. They are accurate when they represent the correct values (often requiring cross-referencing trusted external sources). They are complete when they represent everything that might be known and are consistent when observations do not contradict each other. Measurements are uniform when the same unit of measure is used in all relevant measurements. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Inferential Analysis | ||
|
||
Inferential analysis are techniques geared towards testing a hypothesis about a population, based on your data set, such as: | ||
|
||
- Extrapolation | ||
- P-Value calculation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# JSON (file format) | ||
|
||
JSON or JavaScript Object Notation, also uses a nesting structure, but with the addition of "key/value" pairs, like the firstName key which is tied to the `Smally` value (at least for the first cat!). JSON is popular with web applications that save and send data from your browser to web servers, because it uses the main language of web browsers, JavaScript, to work with data. | ||
|
||
```json | ||
{ | ||
"Cats": [ | ||
{ | ||
"firstName": "Smally", | ||
"lastName": "McTiny" | ||
}, | ||
{ | ||
"firstName": "Kitty", | ||
"lastName": "Kitty" | ||
}, | ||
{ | ||
"firstName": "Foots", | ||
"lastName":"Smith" | ||
}, | ||
{ | ||
"firstName": "Tiger", | ||
"lastName":"Jaws" | ||
} | ||
] | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Open Data Formats | ||
|
||
Open data formats are file formats that are available to anyone, free of charge, which allows for accessibility, future-proofing, and preservation. These file formats also allow for easy reusability and aids research reproduction and accountability. They are not limited by intellectual property rights or copyrights. This is distinct from proprietary formats. Some examples of open data formats are .csv, .pdf, and .json. | ||
|
||
## Readings | ||
|
||
- Library of Congress [Recommended Formats Statement](https://www.loc.gov/preservation/resources/rfs/) | ||
- Stanford University's [best practices for file formats](https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-formats) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Proprietary Data Formats | ||
|
||
Proprietary data file formats are file formats that rely on dedicated, licensed softwares and/or systems. These file formats are often copyrighted, patented, or have other restrictions placed on them, and often require a fee or a paid-for software to open. These file formats are usually discouraged in research projects, especially those with intentions to share with a wider public(s) and audience. This is distinct from open data formats. Some examples of it include .xslx, .doc, and .3ds. | ||
|
||
## Readings | ||
|
||
- Library of Congress [Recommended Formats Statement](https://www.loc.gov/preservation/resources/rfs/) | ||
- Stanford University's [best practices for file formats](https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-formats) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Qualitative Analysis | ||
|
||
Qualitative analysis are techniques geared towards understanding a phenomenon, rather than predicting and testing hypotheses, such as: | ||
|
||
- Grounded Theory/Computational Grounded Theory | ||
- Content Analysis | ||
- Text Analysis | ||
|
||
## Readings | ||
|
||
- [Computational Grounded Theory: A Methodological Framework](https://drive.google.com/file/d/0BxI6W5IIG74FeEtGbjQ0WF9uM0U/view) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# "Raw" Data | ||
|
||
"Raw" data is yet to be processed, meaning it has yet to be manipulated by a human or computer. Received or collected data could be in any number of formats, locations, etc.. It could be in any of the forms listed in the previous section. | ||
|
||
But "raw" data is a relative term, inasmuch as when one person finishes processing data and presents it as a finished product, another person may take that product and work on it further, and for them that data is "raw" data. | ||
|
||
## Readings | ||
|
||
- Johanna Drucker's [Humanities Approaches to Graphical Display](http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Tidy Data | ||
|
||
Tidy data are a way of processing and organizing data in to a data structure that follows these rules: | ||
|
||
1. Each variable is in a column. | ||
2. Each observation is a row. | ||
3. Each value is a cell. | ||
|
||
## Readings | ||
|
||
- [Tidy Data](https://www.jstatsoft.org/article/view/v059i10) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# XML (file format) | ||
|
||
XML or eXstensible Markup Language is a file format that uses a nested structure where the "tags" like `<Cat>` contain other tags inside them, like `<firstName>`. This format is good for organizing the layout of a document in a tree-like format, just like HTML, where we want to nest elements like a sentence within a paragraph, for example. XML does not carry any information about how to be displayed and can be used in a variety of presentation scenarios. | ||
|
||
```xml | ||
<Cats> | ||
<Cat> | ||
<firstName>Smally</firstName> | ||
<lastName>McTiny</lastName> | ||
</Cat> | ||
<Cat> | ||
<firstName>Kitty</firstName> | ||
<lastName>Kitty</lastName> | ||
</Cat> | ||
<Cat> | ||
<firstName>Foots</firstName> | ||
<lastName>Smith</lastName> | ||
</Cat> | ||
<Cat> | ||
<firstName>Tiger</firstName> | ||
<lastName>Jaws</lastName> | ||
</Cat> | ||
</Cats> | ||
``` |