Hi there, I'm Dong Jun Kim π
Iβm a Ph.D. researcher at Korea Universityβs NLP&AI Lab, specializing in Mechanistic Interpretability of Large Language Models (LLMs). My work focuses on reverse-engineering the inner workings of LLMs to uncover their decision-making processes, enhance transparency, and ensure alignment with human values. I am passionate about advancing AI research responsibly by bridging theoretical insights with practical applications.
-
Mechanistic Interpretability
Reverse-engineering LLMs to understand their internal circuits, algorithms, and emergent behaviors. My work includes:- Developing sparse autoencoders to isolate interpretable features in transformer architectures
- Probing causal relationships between model components and specific capabilities
- Mapping information pathways in large-scale models to improve transparency and reliability
-
AI Safety
Ensuring the safe and ethical deployment of AI systems by focusing on:- Automated attack detection (harmfulness/bias detection) using red-teaming techniques
- Mitigating biases in LLMs through interpretability-driven methods
- Designing scalable frameworks for aligning AI behavior with human values
-
Mechanistic Anomaly Detection
Identifying unexpected or harmful behaviors in LLMs by analyzing their internal mechanisms. Key contributions include:- Developing tools to trace causal pathways for diagnosing anomalous outputs
- Designing self-monitoring models for real-world deployment scenarios
- Improving robustness under adversarial or high-stakes conditions
-
Reasoning Models & Agent Systems
Investigating how LLMs reason and interact as agents to perform complex tasks reliably. My research focuses on:- Multi-step reasoning processes within transformer-based architectures
- Building RAG agent systems for dynamic knowledge retrieval and integration
- Exploring compositionality in neural networks for structured reasoning
-
Retrieval-Augmented Generation (RAG)
Enhancing generative models with retrieval mechanisms to improve factual accuracy and groundedness. My work includes:- Designing retrieval pipelines optimized for domain-specific applications
- Reducing hallucinations by embedding retrieval mechanisms into transformer workflows
- Improving factual consistency in generative outputs through hybrid architectures
-
Cognitive Alignment & Ethical AI Design
Ensuring that AI systems operate consistently with ethical principles and human intentions. Contributions include:- Embedding ethical guidelines into model training processes through scalable fine-tuning methods
- Leveraging interpretability tools to monitor alignment over time
- Collaborating across disciplines to design responsible AI frameworks
- Mechanistic Interpretability: Developing novel techniques to analyze the internal structures and decision-making pathways of LLMs.
- AI Safety & Bias Mitigation: Creating robust frameworks to ensure ethical and safe deployment of advanced AI systems.
- RAG & Agent Systems: Designing retrieval-augmented generation pipelines and agent-based systems for dynamic decision-making tasks.
- LlamaIndex: Efficient data indexing and retrieval for LLM-driven applications.
- Haystack: End-to-end NLP framework for building search systems and conversational agents.
- LangFlow: Tool for orchestrating complex workflows in LangChain-based applications.
- Helicone: Observability platform tailored for monitoring LLM-powered applications.
- Gemini (Google): Advanced multimodal language models optimized for enterprise-level tasks.
As a Ph.D. researcher at the NLP&AI Lab under Dr. Heui-Seok Lim, I have contributed to government and industry-funded projects, including collaborations with the Ministry of Food and Drug Safety and KT Gen AI Lab. Key projects include:
- Developing a novel knowledge editing method for domain-specific applications without retraining models.
- Designing automatic attack detection frameworks (harmfulness/bias detection) using red-teaming techniques.
- Creating advanced RAG agent systems capable of dynamic knowledge retrieval for real-time decision-making tasks.
During my B.S. in Computer Science, I worked under Dr. Wanwan Li on augmented reality systems, focusing on automatic room mapping using SLAM algorithms. Additionally, I collaborated with Dr. Edwin Michael to develop agent-based models for pandemic simulations in Hillsborough County, contributing to public health planning during COVID-19.