Skip to content

dominik-pichler/Balmung

Repository files navigation

```
            ,-----.
           #,-. ,-.#
          () a   e ()
          (   (_)   )
          #\_  -  _/#
        ,'   `"""`    `.
      ,'      \X/      `.
     /         X     ____\
    /          v   ,`  v  `,
   /    /         ( <==+==> )
   `-._/|__________\   ^   /
  (\\)  |______@____\  ^  /
    \\  |     ( )    \ ^ /
     )  |             \^/
    (   |             |v
   <(^)>|             |
     v  |             |
        |             |
        |_.--.__ .--._|
          `==='  `==='

    ```

Thoughts and code about Information, Chaos and everything in between.

Thoughts

Information

This is a repo tries to gradually find more sophisticated answers to those questions, either through code or rambling style posts. Feel free to contact me if you're interested

Chaos

This section tries to focus on two fundamental questions:

Implications for Machine Leaning


Tools

1. What do you know? - Kants Knowledge Graph

Question:
How can I visualise ideas and how can I determine connections between different ideas?
For this approach I tried to turn philosophical ideas into knowledge graphs.

Thereby two different approaches have been used to identify entities and relationships.

  1. Using Rule based Parsing Systems
  2. Using an LLM (Llama3) to extract entities and relationships via prompting
  3. Using BERT to extract entities and relationships directly

Eventually, the results have been visualised using pyvis. As the input size increased, this approach of simply visualising all entities and their relationships became unfeasible. Hence, this project is on hold until I've solved the question of "What is the most essential information?".

2. Ishmaels Guide to (Topic) Fishing

Question:
How can I understand what (sub)topics are central in a given document corpus or more specifically (research) area?
Fishing for understanding in a personally new field of understanding can easily become an orientation-less wandering through a dark forest of (pseudo) knowledge. One might need a navigation system find the central intellectual building blocks of this new field of interest. The aim of this project is, to build exactly this navigation system by developing a tool that automatically identifies central ideas and topics in a given field.

Theoretical Overview:
For a general understanding, a comprehensive list of modern topic modelling techniques can be found here:
Modern Topic Modelling Approaches

3. Rank me if you can - Neural (Re-) Rankers

Question:
How can i find the most relevant documents for a given endeavour, in a large pool of documents?

For this quest, I have manually implemented, trained and evaluated the performance of two prominent neural re-ranking algorithms (K-NRM and TK )

Code and results can be found here: Re-Rankers

4. Ask me anything - RAG-based QA System

Question:
How can I extend 3. with arbitrary (OOV) queries?
Code and results can be found here: Q&A

5. Compress me - (Neural) Data-Compressors

During the projects listed above, I worked with lossless compression-algorithms to reduce data sizes (and to identify symbolic Morphemes) and thereby implemented the following algorithms:

  • Shannon - Fano encoding

    • Turned out as a pretty good start by losslessly compressing Kants Critique of pure reason by 55%
  • Neural Compressors:

    • Variable Rate Semantic Compression

About

Poking knowledge, hopping it won't bite

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published