Skip to content

Commit

Permalink
Postgraduate: recent fragmented notes
Browse files Browse the repository at this point in the history
  • Loading branch information
jyhi committed Feb 12, 2021
1 parent 4c99767 commit b4fc338
Show file tree
Hide file tree
Showing 6 changed files with 121 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensing, Web of Data, and Open Data Discovery

## 10 Open Data Principles

1. Completeness (including metadata)
2. ...
3. Timeliness: release asap
4. Ease of physical and electronic access
5. Machine readability
6. Non-discrimination
7. Commonly owned or open standards
8. Licensing
9. Permanence
10. Usage costs

## Open Data Licensing
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Open, Closed, Hybrid, and Open Data

> Open Data means **anyone** can **freely access, use, modify, and share** for **any purpose** (subject, at most, to requirements that preserve provenance and openness).
- Closed data requires permissions in order to access, use, modify, and share.
- Hybrid data is a limited presentation (open) to the proprietary data (closed). (?)

**Linked Data** refers to a set of _best practices_ for publishing and interlinking structured data on the Web.

- Open data is a campaign
- Linked Data can be appled to open, closed, or hybrid data.
- _Best practices:_
- URIs are used as names for things
- HTTP URIs are used so that people can look up the names
- RDF, SPARQL are used to query information
- Other URIs are included for discovering more things

## Structures of Data

- Tabular
- Hierarchical
- Network (Graph)
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Data Cleaning

Data Cleaning is the process of starting with (semi-)raw data from one or more sources and maintain reliable quality for your applications.

Real-life data are often:

- Incomplete
- Inconsistent
- Out-of-context
- ...

It's important to keep a note of the changes while cleaning the data.

## Types of Error in Real-Life Data

- Syntactic: violation of domain constraints
- Semantic: discrepancies between values and the real one in real life

## Properties of Clean Data

### Information Completeness

- Closed World Assumption (CWA): assuming the database has all real-world entities except some missing ones
- Open World Assumption (OWA): assuming the database misses related entities too

### Data Currency

Timeliness; not out-of-date.

## Data Validation

- Consumers understand the data easier
- Programmers do less "defensive programming"
- Producers can precisely define and validate the output

## Tools

- Linter: CSVLint, JSONLint
- for syntactic errors
- _OpenRefine_
- _Excel_, or other spreadsheets
- _Bespoke Scripts_
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Ontology and Data Modelling

Ontology is a model representing some subject matter or a domain.

## Creating Your Own Ontology

- Start with your own domain
- Generalise things
- Find relationships
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# The MIPS Architecture
31 changes: 31 additions & 0 deletions Postgraduate/ELEC6242 Cryptography/2021-02-07-classic-ciphers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Classic Ciphers

Terminologies:

- **Plaintext**: message in a "clear" form
- **Steganography**: message whose **existence is concealed (hidden)**
- **Cryptography**: messge in plain view, but the **meaning is concealed (hidden)**.
- **Cipher**: the operation on groups of **characters**

## Compression & Encryption

Compression and encryption are both about manipulating information (not data or wisdom).

- _Compression_ extracts information from the data to encode it as efficiently as possible.
- _Encryption_ diffuse a **key** into information as much as possible, and encode it.

## The Best Approach

The best way to practically transmit secure data safely is to:

1. Compress the data
- this **removes redundancy** in the plaintext
- this also makes encryption (which is slow) **faster**
2. Encrypt
3. Add error detection and recovery

## Basic Cryptanalysis

- Frequency analysis
- "Crib": a known sequence of letters or words, e.g. _q is almost always followed by u_
- Make guesses

0 comments on commit b4fc338

Please sign in to comment.