Stars
9
stars
written in Python
Clear filter
A high-throughput and memory-efficient inference and serving engine for LLMs
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Efficient Triton Kernels for LLM Training
SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance
Learning to Generate Explanation from e-Hospital Services for Medical Suggestion
Contrastively learning participant representations per round in thread-based debates.
Analysis Model of Discourse Relations within a Document(AMDRD)