Dong Jun Kim junkim100

Hi there, I'm Dong Jun Kim 👋

I’m a Ph.D. researcher at Korea University’s NLP&AI Lab, specializing in Mechanistic Interpretability of Large Language Models (LLMs). My work focuses on reverse-engineering the inner workings of LLMs to uncover their decision-making processes, enhance transparency, and ensure alignment with human values. I am passionate about advancing AI research responsibly by bridging theoretical insights with practical applications.

🚀 Research Interests

Mechanistic Interpretability
Reverse-engineering LLMs to understand their internal circuits, algorithms, and emergent behaviors. My work includes:
- Developing sparse autoencoders to isolate interpretable features in transformer architectures
- Probing causal relationships between model components and specific capabilities
- Mapping information pathways in large-scale models to improve transparency and reliability
AI Safety
Ensuring the safe and ethical deployment of AI systems by focusing on:
- Automated attack detection (harmfulness/bias detection) using red-teaming techniques
- Mitigating biases in LLMs through interpretability-driven methods
- Designing scalable frameworks for aligning AI behavior with human values
Mechanistic Anomaly Detection
Identifying unexpected or harmful behaviors in LLMs by analyzing their internal mechanisms. Key contributions include:
- Developing tools to trace causal pathways for diagnosing anomalous outputs
- Designing self-monitoring models for real-world deployment scenarios
- Improving robustness under adversarial or high-stakes conditions
Reasoning Models & Agent Systems
Investigating how LLMs reason and interact as agents to perform complex tasks reliably. My research focuses on:
- Multi-step reasoning processes within transformer-based architectures
- Building RAG agent systems for dynamic knowledge retrieval and integration
- Exploring compositionality in neural networks for structured reasoning
Retrieval-Augmented Generation (RAG)
Enhancing generative models with retrieval mechanisms to improve factual accuracy and groundedness. My work includes:
- Designing retrieval pipelines optimized for domain-specific applications
- Reducing hallucinations by embedding retrieval mechanisms into transformer workflows
- Improving factual consistency in generative outputs through hybrid architectures
Cognitive Alignment & Ethical AI Design
Ensuring that AI systems operate consistently with ethical principles and human intentions. Contributions include:
- Embedding ethical guidelines into model training processes through scalable fine-tuning methods
- Leveraging interpretability tools to monitor alignment over time
- Collaborating across disciplines to design responsible AI frameworks

🛠 Core Expertise

Mechanistic Interpretability: Developing novel techniques to analyze the internal structures and decision-making pathways of LLMs.
AI Safety & Bias Mitigation: Creating robust frameworks to ensure ethical and safe deployment of advanced AI systems.
RAG & Agent Systems: Designing retrieval-augmented generation pipelines and agent-based systems for dynamic decision-making tasks.

🧰 Tools & Technologies

Machine Learning Frameworks:

Tools & Platforms:

Optimization & Acceleration:

Additional Frameworks for LLM Development:

LlamaIndex: Efficient data indexing and retrieval for LLM-driven applications.
Haystack: End-to-end NLP framework for building search systems and conversational agents.
LangFlow: Tool for orchestrating complex workflows in LangChain-based applications.
Helicone: Observability platform tailored for monitoring LLM-powered applications.
Gemini (Google): Advanced multimodal language models optimized for enterprise-level tasks.

📚 Education & Research Experience

Korea University (2024 – Present)

As a Ph.D. researcher at the NLP&AI Lab under Dr. Heui-Seok Lim, I have contributed to government and industry-funded projects, including collaborations with the Ministry of Food and Drug Safety and KT Gen AI Lab. Key projects include:

Developing a novel knowledge editing method for domain-specific applications without retraining models.
Designing automatic attack detection frameworks (harmfulness/bias detection) using red-teaming techniques.
Creating advanced RAG agent systems capable of dynamic knowledge retrieval for real-time decision-making tasks.

University of South Florida (2019 – 2023)

During my B.S. in Computer Science, I worked under Dr. Wanwan Li on augmented reality systems, focusing on automatic room mapping using SLAM algorithms. Additionally, I collaborated with Dr. Edwin Michael to develop agent-based models for pandemic simulations in Hillsborough County, contributing to public health planning during COVID-19.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dong Jun Kim junkim100

Achievements