A Next.js application that leverages Meta's Llama 3.2 Vision-Instruct multimodal model (meta-llama/Llama-3.2-11B-Vision-Instruct) to provide detailed, natural language descriptions of uploaded images.
- 🖼️ Drag-and-drop or click-to-upload image interface
- 🤖 Powered by Meta's Llama 3.2 Vision-Instruct model (11B parameters)
- 💫 Real-time streaming responses
- 📝 Comprehensive image analysis covering:
- Primary subjects and their attributes
- Environmental details
- Spatial relationships
- Lighting and color analysis
- Atmospheric elements
- Textural qualities
- Compositional elements
- Next.js 14
- React
- TypeScript
- Hugging Face Inference API
- react-dropzone
- Tailwind CSS
This project uses the meta-llama/Llama-3.2-11B-Vision-Instruct
model from Hugging Face, which is a multimodal variant of Llama 3.2 capable of:
- Processing both text and images simultaneously
- Understanding visual content and context
- Generating detailed natural language descriptions
- Providing rich, context-aware analysis of visual elements
Before you begin, ensure you have:
- Node.js (v18 or higher)
- A Hugging Face API key
- npm or yarn installed
Create a .env
file in the root directory:
HUGGINGFACE_API_KEY=your_api_key_here
- Clone the repository:
git clone [repository-url]
cd [project-directory]
- Install dependencies:
npm install
# or
yarn install
- Start the development server:
npm run dev
# or
yarn dev
- Open your browser and navigate to
http://localhost:3000
- Upload an image by dragging and dropping or clicking the upload area
- Wait for the AI to analyze your image
- View the detailed description in the text area on the right
Processes images and returns AI-generated descriptions.
Request:
- Method: POST
- Body: FormData
image
: Image file (supported formats: PNG, JPG, JPEG, GIF)prompt
: Custom prompt for analysis (optional)
Response:
- Streams the AI-generated description as text/plain
This project is licensed under the MIT License - see the LICENSE file for details.
- Meta for the Llama 3.2 Vision-Instruct multimodal model
- Hugging Face for hosting the model and providing the inference API
- Next.js team for the amazing framework