AI
AI News

Latest AI News

Are AI agents ready for the workplace? A new benchmark raises doubts. | TechCrunch

Are AI agents ready for the workplace? A new benchmark raises doubts. | TechCrunch

In a recent analysis, Microsoft CEO Satya Nadella's prediction from two years ago about AI's potential to replace white-collar jobs is being reevaluated. Despite advancements in AI capabilities, the expected widespread displacement of roles in sectors like law, finance, and IT has not materialized as anticipated. The article explores the challenges and nuances in integrating AI within these professions, suggesting that while AI can enhance productivity, it may not fully replace the human element essential in knowledge work.

TechCrunch
Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs

Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs

NVIDIA has teamed up with Black Forest Labs (BFL) to enhance the FLUX.1 text-to-image model series. This collaboration aims to achieve FP4 image generation capabilities specifically for the upcoming NVIDIA Blackwell GeForce RTX 50 Series GPUs, set to release in 2025. This advancement could significantly improve real-time image rendering for developers and creators leveraging AI-driven graphics.

Nvidia.com
CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback

CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback

Recent research enhances camera-controlled video diffusion models, tackling limitations in camera controllability. The study introduces an efficient 3D decoder that transforms video latent and camera pose into 3D representations, optimizing pixel-level consistency for improved alignment. This method addresses existing reward model deficiencies and reduces computational overhead, showing effectiveness on RealEstate10K and WorldScore benchmarks. For more details, visit the [CamPilot page](https://a-bigbao.github.io/CamPilot/).

arXiv
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Research on Representation Autoencoders (RAEs) indicates they excel in large-scale text-to-image (T2I) generation, outperforming state-of-the-art Variational Autoencoders (VAEs) across model scales. RAEs show faster convergence, superior generation quality, and stability during finetuning. This suggests RAEs could streamline T2I frameworks, enhancing multimodal models that integrate visual understanding and generation.

arXiv
Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing

Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing

A new approach called Feature-space Smoothing (FS) has been proposed to enhance the robustness of multimodal large language models (MLLMs) against adversarial attacks. FS guarantees a certified lower bound on feature cosine similarity under $\ell_2$-bounded attacks. The addition of the Purifier and Smoothness Mapper (PSM) module further improves robustness without retraining. Experiments show that FS-PSM significantly reduces the Attack Success Rate from nearly 90% to about 1%, outperforming traditional adversarial training across various MLLMs and tasks.

arXiv
This OS quietly powers all AI - and most future IT jobs, too

This OS quietly powers all AI - and most future IT jobs, too

ZDNET’s latest piece emphasizes that Linux is the predominant OS for AI applications, with no viable alternatives. Key players like Canonical and Red Hat are central to this landscape, providing essential support and tools for AI development. The article underscores the need for companies to adopt Linux for effective AI deployment and management.

ZDNet
A timeline of the US semiconductor market in 2025 | TechCrunch

A timeline of the US semiconductor market in 2025 | TechCrunch

The U.S. semiconductor industry faced significant upheaval in 2022, marked by leadership shifts at major companies and evolving discussions on AI chip export regulations. These developments highlight the sector's ongoing adaptation to geopolitical pressures and technological advancements, shaping future strategies and competitive dynamics.

TechCrunch
Irony alert: Hallucinated citations found in papers from NeurIPS, the prestigious AI conference | TechCrunch

Irony alert: Hallucinated citations found in papers from NeurIPS, the prestigious AI conference | TechCrunch

AI detection startup GPTZero analyzed 4,841 papers from the recent NeurIPS conference in San Diego, revealing that 1,900 submissions, or about 39%, contained AI-generated content. This highlights the increasing prevalence of AI in academic writing, raising concerns about authenticity and originality in research. The findings may prompt stricter guidelines for AI use in academic submissions.

TechCrunch
The US and China Are Collaborating More Closely on AI Than You Think

The US and China Are Collaborating More Closely on AI Than You Think

The US and China are locked in a competitive race in artificial intelligence, focusing on advancements in algorithms, models, and hardware. Despite their rivalry, collaboration persists in academic research, with expertise and resources being shared. This dynamic presents both opportunities and challenges, as national security concerns rise alongside innovation. The balance between competition and cooperation could shape the future landscape of AI development and regulation.

Wired
APPLE: Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping

APPLE: Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping

Researchers have developed APPLE (Attribute-Preserving Pseudo-Labeling), a new face-swapping method that enhances identity transfer while maintaining key attributes like lighting and makeup. By treating face swapping as a conditional deblurring task and using a teacher-student framework for better supervision, APPLE delivers photorealistic results and sets a new standard in attribute preservation.

arXiv
Towards Understanding Best Practices for Quantization of Vision-Language Models

Towards Understanding Best Practices for Quantization of Vision-Language Models

A study investigates the effectiveness of various quantization methods, including GPTQ and AWQ, on multimodal pipelines involving vision and language models. Results show that both ViT and LLM are crucial for performance, with lower-bit quantization of LLM maintaining high accuracy. This research offers insights for optimizing memory and latency in deploying multimodal language models. The code is available at https://github.com/gautomdas/mmq.

arXiv
Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks

Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks

Researchers have developed AdSent, a new framework that enhances fake news detection by countering sentiment manipulation, a vulnerability exposed by large language models. The study reveals that altering sentiment significantly impacts detection accuracy, favoring neutral articles as genuine. AdSent employs a sentiment-agnostic training strategy, outperforming existing models in robustness and accuracy across various datasets.

arXiv