January 14, 2024 – Dave Berry

Emerging Trends and Systems Implications of Multi-Modal AI Models

Source: https://arxiv.org/abs/2312.14385 Introduction As generative AI continues to advance, models are evolving beyond text generation to include image and video synthesis capabilities. However, these multi-modal models come with unique systems-level challenges compared to traditional language models. A new paper from researchers at Meta and Harvard University provides the first in-depth analysis characterizing the system performance and implications of text-to-image (TTI) and text-to-video (TTV) generative AI models. Their analysis compares two main model architectures – Diffusion-based and Transformer-based – across eight representative models on dimensions like latency, computational intensity, and component breakdown. The researchers make several key observations about the distinct

January 14, 2024

Dave's Blog

Emerging Trends and Systems Implications of Multi-Modal AI Models

Share

Most Popular

From Theory to Code: A Deep Dive into Molecular Extended-Connectivity Fingerprints (ECFPs) with Python

Emerging Trends and Systems Implications of Multi-Modal AI Models

Prefix Tuning: Lightweight Adaptation of Large Language Models for Customized Natural Language Generation

Multimodal Few-Shot Learning with Frozen Language Models: A Review

RLHF Training at Scale with DeepSpeed-Chat

Categories

Browse

Follow