-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Postgraduate: recent fragmented notes
- Loading branch information
Showing
6 changed files
with
121 additions
and
0 deletions.
There are no files selected for viewing
16 changes: 16 additions & 0 deletions
16
...14 Open Data Innovation/2021-02-05-licensing-web-of-data-open-data-discovery.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Licensing, Web of Data, and Open Data Discovery | ||
|
||
## 10 Open Data Principles | ||
|
||
1. Completeness (including metadata) | ||
2. ... | ||
3. Timeliness: release asap | ||
4. Ease of physical and electronic access | ||
5. Machine readability | ||
6. Non-discrimination | ||
7. Commonly owned or open standards | ||
8. Licensing | ||
9. Permanence | ||
10. Usage costs | ||
|
||
## Open Data Licensing |
22 changes: 22 additions & 0 deletions
22
Postgraduate/COMP6214 Open Data Innovation/2021-02-05-structures.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Open, Closed, Hybrid, and Open Data | ||
|
||
> Open Data means **anyone** can **freely access, use, modify, and share** for **any purpose** (subject, at most, to requirements that preserve provenance and openness). | ||
- Closed data requires permissions in order to access, use, modify, and share. | ||
- Hybrid data is a limited presentation (open) to the proprietary data (closed). (?) | ||
|
||
**Linked Data** refers to a set of _best practices_ for publishing and interlinking structured data on the Web. | ||
|
||
- Open data is a campaign | ||
- Linked Data can be appled to open, closed, or hybrid data. | ||
- _Best practices:_ | ||
- URIs are used as names for things | ||
- HTTP URIs are used so that people can look up the names | ||
- RDF, SPARQL are used to query information | ||
- Other URIs are included for discovering more things | ||
|
||
## Structures of Data | ||
|
||
- Tabular | ||
- Hierarchical | ||
- Network (Graph) |
42 changes: 42 additions & 0 deletions
42
Postgraduate/COMP6214 Open Data Innovation/2021-02-08-data-cleaning.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Data Cleaning | ||
|
||
Data Cleaning is the process of starting with (semi-)raw data from one or more sources and maintain reliable quality for your applications. | ||
|
||
Real-life data are often: | ||
|
||
- Incomplete | ||
- Inconsistent | ||
- Out-of-context | ||
- ... | ||
|
||
It's important to keep a note of the changes while cleaning the data. | ||
|
||
## Types of Error in Real-Life Data | ||
|
||
- Syntactic: violation of domain constraints | ||
- Semantic: discrepancies between values and the real one in real life | ||
|
||
## Properties of Clean Data | ||
|
||
### Information Completeness | ||
|
||
- Closed World Assumption (CWA): assuming the database has all real-world entities except some missing ones | ||
- Open World Assumption (OWA): assuming the database misses related entities too | ||
|
||
### Data Currency | ||
|
||
Timeliness; not out-of-date. | ||
|
||
## Data Validation | ||
|
||
- Consumers understand the data easier | ||
- Programmers do less "defensive programming" | ||
- Producers can precisely define and validate the output | ||
|
||
## Tools | ||
|
||
- Linter: CSVLint, JSONLint | ||
- for syntactic errors | ||
- _OpenRefine_ | ||
- _Excel_, or other spreadsheets | ||
- _Bespoke Scripts_ |
9 changes: 9 additions & 0 deletions
9
...raduate/COMP6214 Open Data Innovation/2021-02-12-ontology-and-data-modelling.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Ontology and Data Modelling | ||
|
||
Ontology is a model representing some subject matter or a domain. | ||
|
||
## Creating Your Own Ontology | ||
|
||
- Start with your own domain | ||
- Generalise things | ||
- Find relationships |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# The MIPS Architecture |
31 changes: 31 additions & 0 deletions
31
Postgraduate/ELEC6242 Cryptography/2021-02-07-classic-ciphers.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Classic Ciphers | ||
|
||
Terminologies: | ||
|
||
- **Plaintext**: message in a "clear" form | ||
- **Steganography**: message whose **existence is concealed (hidden)** | ||
- **Cryptography**: messge in plain view, but the **meaning is concealed (hidden)**. | ||
- **Cipher**: the operation on groups of **characters** | ||
|
||
## Compression & Encryption | ||
|
||
Compression and encryption are both about manipulating information (not data or wisdom). | ||
|
||
- _Compression_ extracts information from the data to encode it as efficiently as possible. | ||
- _Encryption_ diffuse a **key** into information as much as possible, and encode it. | ||
|
||
## The Best Approach | ||
|
||
The best way to practically transmit secure data safely is to: | ||
|
||
1. Compress the data | ||
- this **removes redundancy** in the plaintext | ||
- this also makes encryption (which is slow) **faster** | ||
2. Encrypt | ||
3. Add error detection and recovery | ||
|
||
## Basic Cryptanalysis | ||
|
||
- Frequency analysis | ||
- "Crib": a known sequence of letters or words, e.g. _q is almost always followed by u_ | ||
- Make guesses |