-
KNAW Humanities Cluster & CLST, Radboud University
- Eindhoven, the Netherlands
- https://proycon.anaproy.nl
- https://orcid.org/0000-0002-1046-0006
- @[email protected]
-
python-timbl Public
python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ inβ¦
-
dotfiles Public
My dotfiles (mirror of https://git.sr.ht/~proycon/dotfiles)
-
switchboard-tool-registry Public
Forked from clarin-eric/switchboard-tool-registryThe Switchboard Tool Registry
Python GNU General Public License v3.0 UpdatedNov 28, 2024 -
colibri-core Public
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed orβ¦
-
colibri-utils Public
NLP utilities that rely on Colibri Core: currently only language identification
-
globalise-tools Public
Forked from knaw-huc/globalise-toolstools for globalise tasks
Python Apache License 2.0 UpdatedNov 21, 2024 -
foliapy Public
An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processingβ¦
-
-
homeassistant-config Public
My elaborate home automation configuration + scripts
-
foliatools Public
A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
-
python-frog Public
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
-
homepage Public
My website (mirror of https://git.sr.ht/~proycon/homepage)
TeX Other UpdatedOct 19, 2024 -
analiticcl Public
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
-
alpino_clam_webservice Public
A CLAM-powered webservice for Alpino, a dependency parser for Dutch
-
sesdiff Public
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
-
lingua-cli Public
Very small simple command-line interface for language detection using lingua-rs
-
python-ucto Public
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. Thiβ¦
-
cli-apps Public
Forked from toolleeo/awesome-cli-apps-in-a-csvThe largest Awesome Curated list of CLI/TUI applications with source data organized into CSV files
Python UpdatedSep 10, 2024 -
lighthome Public
Lightweight home automation scripts and programs, over MQTT (mirror of https://git.sr.ht/~proycon/lighthome)
Shell UpdatedSep 3, 2024 -
charfreq Public
Very simply command-line tool that counts (unicode) character frequency from standard input
-
vocage Public
A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal
-
flat Public
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annoβ¦
-
lexmatch Public
Simple lexicon matcher against a text
-
codemetapy Public
A Python package for generating and working with codemeta
-
codemeta-harvester Public
Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
-
codemeta2mp Public
codemeta to SSHOC Open Marketplace converter
Python GNU General Public License v3.0 UpdatedMay 21, 2024 -
folia Public
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety ofβ¦
-
-
ucto_webservice Public
Webservice for the ucto, a rule-based tokeniser for multiple languages
Python UpdatedMar 14, 2024 -
clam Public
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, andβ¦