Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from skynettoday:master #191

Merged
merged 1 commit into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Digest 299
  • Loading branch information
andreykurenkov committed Dec 19, 2024
commit 4ba25fb2b11976019500dd0794feeb66e8f89458
127 changes: 127 additions & 0 deletions _posts/digests/2024-12-16-299.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
layout: redirect
title: "Last Week in AI #299"
excerpt: "Google's Project Mariner surfs the web for you 🏄‍♂️, Microsoft's Phi-4 solves math problems 🧮, DeepMind's Veo 2 creates 4K videos 🎥, Pika Labs' AI video generator gets a major upgrade 🚀, and more!"
image:
feature: assets/img/digests/299/deepmind.png?resize=1200,675
credit: <a href="<Image Source Link>"> <Author> / <Source Name> </a>
categories: [digests]
permalink: /digests/the-two-hundred-and-ninety-ninth
sidebartoc: true
redirect: https://lastweekin.ai/p/299
---

### Top News

#### [Google unveils Project Mariner: AI agents to use the web for you](https://techcrunch.com/2024/12/11/google-unveils-project-mariner-ai-agents-to-use-the-web-for-you/)
![](https://techcrunch.com/wp-content/uploads/2016/07/deepmind.png?resize=1200,675)

Google's DeepMind division has unveiled Project Mariner, an AI agent that can interact with the web on behalf of users. The Gemini-powered agent can control a Chrome browser, move the cursor, click buttons, and fill out forms, mimicking human interaction with websites. The agent, which is currently being tested by a small group, can perform tasks such as creating a shopping cart from a grocery list or finding flights and hotels. However, it cannot fill out credit card information or accept cookies on behalf of users. The agent works on the foremost active tab of a Chrome browser, meaning users must watch as the agent performs tasks. Google believes this new AI agent represents a significant shift in user experience, potentially changing how users interact with the web.

#### [Microsoft debuts Phi-4, a new generative AI model, in research preview](https://techcrunch.com/2024/12/12/microsoft-debuts-phi-4-a-new-generative-ai-model-in-research-preview/)
![](https://techcrunch.com/wp-content/uploads/2024/07/microsoft-logo-office.jpg?resize=1200,798)

Microsoft has introduced Phi-4, the latest addition to its Phi series of generative AI models, which is particularly adept at solving math problems due to improved training data quality. The model, which consists of 14 billion parameters, is currently available in limited access on Microsoft's Azure AI Foundry development platform for research purposes. Phi-4's enhanced performance is attributed to the use of high-quality synthetic datasets and human-generated content, as well as unspecified post-training improvements. This release marks the first Phi-series model launch since the departure of Sébastien Bubeck, a key figure in Microsoft's Phi model development, who left the company to join OpenAI.


#### [Google DeepMind unveils a new video model to rival Sora](https://techcrunch.com/2024/12/16/google-deepmind-unveils-a-new-video-model-to-rival-sora/)
![](https://techcrunch.com/wp-content/uploads/2023/10/deepmind.jpg?resize=1200,675)

Google's AI research lab, DeepMind, has announced Veo 2, a next-generation video-generating AI that can create two-minute clips in resolutions up to 4K, surpassing OpenAI's Sora in terms of resolution and duration. Veo 2, which is exclusively available on Google's experimental video creation tool, VideoFX, has an improved understanding of physics and camera controls, and produces clearer footage. The model can more realistically model motion, fluid dynamics, and properties of light, including different lenses and cinematic effects. Despite some limitations, DeepMind is working with artists and producers to refine its video-generation models and tooling. The company also announced upgrades to Imagen 3, its commercial image generation model, which can create brighter, better-composed images and photos in various styles.


#### [Pika Labs releases AI video generator 2.0 with new features](https://the-decoder.com/pika-labs-releases-ai-video-generator-2-0-with-new-features/)
![](https://the-decoder.com/wp-content/uploads/2024/12/pika_labs_20_cat_guy.png)

Pika Labs has launched version 2.0 of its AI video generator, introducing a significant feature called "Scene Ingredients" that enables users to incorporate their own images into AI-generated videos. The AI tool works by allowing users to construct scenes from various visual components, such as pictures of people, objects, clothing, or environments, and the AI then determines the purpose of each image and merges them into a functional scene. This updated video generator, which also boasts enhanced visual quality and improved prompt adherence, will be accessible to all users, including those in the European Union, contrasting with OpenAI's Sora, which is only fully available to Pro subscribers. Pika Labs, founded by Stanford students Demi Guo and Chenlin Meng, has raised $80 million and is currently valued at $470 million.




### Other News
#### Tools
![](https://image.cnbcfm.com/api/v1/image/108074672-1733945784281-gettyimages-2156836169-FRIOS202406120002.jpeg?v=1733945863&w=1920&h=1080)

[Apple launches its ChatGPT integration with Siri](https://www.cnbc.com/2024/12/11/apple-launches-its-chatgpt-integration-with-siri.html) - Apple's latest software updates for iPhone, iPad, and Mac introduce a ChatGPT integration with Siri, enhancing its ability to handle complex queries while maintaining user privacy, and marking a significant step in Apple's AI strategy.

[ChatGPT’s AI search engine is rolling out to everyone](https://www.theverge.com/2024/12/16/24322665/chatgpt-search-engine-rolling-out-free-users) - OpenAI's ChatGPT search engine, now available to all users, includes an optimized mobile version with advanced voice mode and features resembling traditional search engines, such as location-based results with images and maps.

[Google Gemini can now do more in-depth research](https://techcrunch.com/2024/12/11/gemini-can-now-research-deeper/) - Google's upgraded Gemini platform introduces "Deep Research," a feature that uses advanced reasoning to compile comprehensive research reports, raising ethical concerns about its impact on education and publisher revenue.

[OpenAI brings its o1 reasoning model to its API — for certain developers](https://techcrunch.com/2024/12/17/openai-brings-its-o1-reasoning-model-to-its-api-for-certain-developers/) - OpenAI's o1 reasoning model, now available to select developers via its API, offers enhanced customization and accuracy but at a higher cost and with limited initial access.

[NVIDIA Unveils Its Most Affordable Generative AI Supercomputer](https://blogs.nvidia.com/) - NVIDIA's new compact generative AI supercomputer, enhanced by a software upgrade, delivers improved performance at a more accessible price point.

[ChatGPT adds live video access to "see" what your phone sees](https://www.axios.com/2024/12/12/chatgpt-video-screen-sharing-voice-chat) - nan

[Meta AI Releases Apollo: A New Family of Video-LMMs Large Multimodal Models for Video Understanding](https://www.marktechpost.com/2024/12/16/meta-ai-releases-apollo-a-new-family-of-video-lmms-large-multimodal-models-for-video-understanding/) - Meta AI's Apollo models introduce innovative techniques like fps sampling and dual vision encoders to enhance video understanding, achieving strong performance across video-language tasks while offering scalable solutions for real-world applications.

[UAE's TII Launches Falcon 3: High-Performance Small AI Models](https://www.maginative.com/article/uaes-tii-launches-falcon-3-high-performance-small-ai-models/) - Falcon 3, launched by the UAE's Technology Innovation Institute, is a high-performance small AI model series that outperforms competitors like Meta's LLaMA, supports multiple languages, and is optimized for efficient operation on edge devices with limited resources.

[Meta debuts a tool for watermarking AI-generated videos](https://techcrunch.com/2024/12/12/meta-releases-a-tool-for-watermarking-ai-generated-videos/) - Meta has introduced Meta Video Seal, an open-source tool for watermarking AI-generated videos to combat the rise of deepfakes, offering resilience against common video edits and compression, while encouraging industry adoption through a public leaderboard and collaboration initiatives.

[Meta FAIR Releases Meta Motivo: A New Behavioral Foundation Model for Controlling...](https://www.marktechpost.com/2024/12/16/meta-fair-releases-meta-motivo-a-new-behavioral-foundation-model-for-controlling-virtual-physics-based-humanoid-agents-for-a-wide-range-of-complex-whole-body-tasks/) - Meta Motivo, developed using the FB-CPR algorithm, is a behavioral foundation model for humanoid control that excels in zero-shot learning tasks by leveraging forward-backward representations and conditional policy regularization to achieve human-like behavior and high performance across diverse tasks.

[Midjourney adds faster model customization and mood board support](https://the-decoder.com/midjourney-adds-faster-model-customization-and-mood-board-support/) - Midjourney's latest update enhances AI model customization with faster personalization, mood board support, and multiple model profiles, requiring fewer image ratings for effective use.

[OpenAI announces a ChatGPT organizing system called Projects](https://mashable.com/article/openai-announces-projects-to-organize-customize-chatgpt-convos) - OpenAI's new Projects feature in ChatGPT enhances user experience by allowing customization and organization of chats, integrating capabilities like Canvas support and web connection for tasks such as project management and personal website creation.

[X gains a faster Grok model and a new ‘Grok button’](https://techcrunch.com/2024/12/13/x-gains-a-faster-grok-model-and-a-new-grok-button/) - xAI has launched an upgraded Grok 2 chatbot model with enhanced speed and capabilities, introduced a new "Grok button" for contextual insights on X, and announced API improvements with reduced pricing and upcoming integration of the Aurora image generation model.

[Meta AI Introduces Byte Latent Transformer (BLT): A Tokenizer-Free Model That Scales Efficiently](https://www.marktechpost.com/2024/12/13/meta-ai-introduces-byte-latent-transformer-blt-a-tokenizer-free-model-that-scales-efficiently/) - Meta AI's Byte Latent Transformer (BLT) eliminates tokenization by processing raw byte sequences into dynamic patches, improving efficiency, scalability, and robustness in language models compared to traditional tokenization-based architectures.

[Google is adding a 'join' feature to its NotebookLM AI podcast generator, so you can become part of the show.](https://www.techradar.com/computing/artificial-intelligence/google-is-adding-a-join-feature-to-its-notebooklm-ai-podcast-generator-so-you-can-become-part-of-the-show) - Google's NotebookLM AI podcast generator now allows users to join and interact with AI hosts during podcasts, with additional customization features and a paid tier launching in 2025.

[Genesis](https://genesis-embodied-ai.github.io/) - Genesis is a versatile physics simulation platform that combines a universal physics engine, a fast robotics simulation platform, a photo-realistic rendering system, and a generative data engine to automate data generation for various AI applications.

#### Business
![](https://s.yimg.com/ny/api/res/1.2/jMecbx2ArowtaIfVpwJtfw--/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD03NzA-/https://media.zenfs.com/en/bloomberg_technology_68/b94efb3e93afd12d9111347b677ad840)

[Databricks to Hit $62 Billion Valuation in New Funding Round](https://finance.yahoo.com/news/databricks-hit-62-billion-valuation-153858326.html) - Databricks is raising $10 billion in funding to reach a $62 billion valuation, with plans to invest in AI products, acquisitions, and international expansion while preparing for a potential future public offering.

[Salesforce plans to hire 2,000 people to sell its AI products](https://techcrunch.com/2024/12/17/salesforce-plans-to-hire-2000-people-to-sell-its-ai-products/) - Salesforce is significantly expanding its sales team to support the upcoming release of its second-generation AI agent software, with CEO Marc Benioff expressing unprecedented enthusiasm for the company's AI initiatives.

[Liquid AI just raised $250M to develop a more efficient type of AI model](https://techcrunch.com/2024/12/13/liquid-ai-just-raised-250m-to-develop-a-more-efficient-type-of-ai-model/) - Liquid AI is developing flexible and efficient liquid neural networks for various applications, with a significant investment from AMD to optimize these models for their hardware.

[Meta Urges California Attorney General to Stop OpenAI From Becoming For-Profit](https://www.wsj.com/tech/ai/elon-musk-open-ai-lawsuit-response-c1f415f8) - nan

[Musk Offers Free Access to Grok-2 AI Chatbot to X Users](https://www.pymnts.com/artificial-intelligence-2/2024/musk-offers-free-access-to-grok-2-ai-chatbot-to-x-users/) - Elon Musk's xAI is rolling out the faster and more accurate Grok-2 AI chatbot with multilingual capabilities for free to all users on his social media platform, X, while offering premium users additional benefits and features.

[OnlyFans Models Are Using AI Impersonators to Keep Up With Their DMs](https://www.wired.com/story/onlyfans-models-are-using-ai-impersonators-to-keep-up-with-their-dms/) - AI-generated chatters are increasingly being used by OnlyFans creators to manage their direct message interactions, replacing human contractors and raising questions about job displacement and platform policies.

[OpenAI says Elon Musk wanted it to be for-profit in 2017](https://www.cnbc.com/2024/12/13/openai-says-elon-musk-wanted-it-to-be-for-profit-in-2017.html) - OpenAI claims Elon Musk initially supported a for-profit structure for the company in 2017 but later opposed its transition to a fully for-profit model after failing to secure majority control.

[OpenAI Releases Emails Showing Elon Musk 'Wanted An OpenAI For-Profit'](https://www.webpronews.com/openai-releases-emails-showing-elon-musk-wanted-an-openai-for-profit/) - OpenAI is firing back at Elon Musk and his lawsuit, saying the co-founder originally wanted OpenAI to be a for-profit company. Elon Musk is suing to prevent OpenAI from transitioning to a for-profit company.

#### Research
![](https://lh3.googleusercontent.com/PNlhxhf4LKLRCezIt7Ap358F91-vbK5dLp56Ak1FejpCZh3YTp6jGqIDJm9c0iAtx8Y73MCTu279c1k2GZkM2qXXaqx315NSOaSiU0y0ATMK2c2Hyw=w1200-h630-n-nu)

[FACTS Grounding: A new benchmark for evaluating the factuality of large language models](https://deepmind.google/discover/blog/facts-grounding-a-new-benchmark-for-evaluating-the-factuality-of-large-language-models/) - FACTS Grounding is a new benchmark designed to evaluate and improve the factual accuracy and grounding of large language models by measuring their ability to generate factually accurate and detailed responses based on provided source material.

[Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation](https://arxiv.org/abs/2412.03304v1) - Global MMLU addresses cultural and linguistic biases in multilingual evaluations by improving translation quality and evaluating cultural biases, resulting in a more comprehensive benchmark across 42 languages.

[UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling](https://arxiv.org/abs/2408.04810v1) - UniBench provides a unified framework for evaluating vision-language models across over 50 benchmarks, revealing that while scaling can enhance some capabilities, it is less effective for reasoning tasks, and highlights the importance of data quality and tailored learning objectives.

[DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding](https://arxiv.org/abs/2412.10302v1) - DeepSeek-VL2 introduces a dynamic tiling vision encoding strategy and Multi-head Latent Attention mechanism to enhance multimodal understanding, achieving state-of-the-art performance in tasks like visual question answering and optical character recognition.

[Multimodal Latent Language Modeling with Next-Token Diffusion](https://arxiv.org/abs/2412.08635v1) - Latent Language Modeling (LatentLM) effectively integrates continuous and discrete data using causal Transformers, outperforming existing models in multimodal tasks such as image generation and text-to-speech synthesis.

[FullStack Bench: Evaluating LLMs as Full Stack Coders](https://arxiv.org/abs/2412.00535v4) - FullStack Bench is a comprehensive code evaluation dataset designed to assess the capabilities of large language models in full-stack programming across multiple domains and languages, supported by the SandboxFusion tool for efficient performance evaluation.

#### Concerns
![](https://cdn.arstechnica.net/wp-content/uploads/2024/12/GettyImages-1354022389-1152x648.jpg)

[Character.AI steps up teen safety after bots allegedly caused suicide, self-harm](https://arstechnica.com/tech-policy/2024/12/character-ai-steps-up-teen-safety-after-bots-allegedly-caused-suicide-self-harm/) - Character.AI is implementing a separate model for teens and additional safety features, including content filtering and parental controls, in response to lawsuits alleging that its chatbots contributed to harmful behaviors in minors.

[Tesla is having major issue with its self-driving computer inside new cars](https://electrek.co/2024/12/16/tesla-major-issue-self-driving-computer-inside-new-cars/) - Tesla's new HW4 self-driving computers are experiencing failures due to potential short-circuiting issues, overwhelming service centers and raising safety concerns without an official recall or service bulletin.

[AI thought X-rays of your knees show if you drink beer—they don’t.](https://www.dartmouth-health.org/about/news/article/ai-thought-x-rays-your-knees-show-if-you-drink-beer-they-dont) - AI models in medical imaging can produce misleading results by exploiting unintended data patterns, highlighting the need for rigorous evaluation to prevent erroneous clinical insights.

#### Expert Opinions
![](https://cdn.vox-cdn.com/thumbor/SELuU01-YTH0RxoXeHlCS1Uupt4=/0x0:6000x4000/1200x628/filters:focal(3799x1547:3800x1548)/cdn.vox-cdn.com/uploads/chorus_asset/file/25789444/1258459915.jpg)

[OpenAI cofounder Ilya Sutskever says the way AI is built is about to change](https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training) - Ilya Sutskever predicts a shift in AI development due to the finite nature of data, leading to future AI systems that are more autonomous and capable of reasoning beyond current pattern-matching methods.

<hr>

Copyright © 2024 Skynet Today, All rights reserved.
Binary file not shown.