(AINews) not much happened today • Buttondown

October 10, 2024

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

AI is all you need to be a chemist.

AI News for 10/8/2024-10/9/2024. We checked 7 subreddits, 433 Twitters and 31 Discords (228 channels, and 1872 messages) for you. Estimated reading time saved (at 200wpm): 222 minutes. You can now tag @smol_ai for AINews discussions!

Just a smattering of smol stories today:

Table of Contents

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Advancements and Industry News

Nobel Prize in Physics: @ilyasut announced that Geoffrey Hinton won the Nobel Prize in Physics for his contributions to AI. @demishassabis noted that Hinton “laid the foundations for the deep learning revolution that underpins the modern AI field.” The award was shared with John Hopfield, recognizing their work on neural networks and their connections to physics concepts.

Model Developments: @AIatMeta introduced a 13B parameter audio generation model as part of Meta Movie Gen, capable of generating high-quality audio synced to video. @rohanpaul_ai highlighted PMRF, a new photo-realistic image restoration algorithm.

AI Tools and Platforms: @AnthropicAI launched the Message Batches API, allowing processing of up to 10,000 queries asynchronously at 50% less cost than standard API calls. @togethercompute announced Flux Schnell, a new model available for free in their API for the next 3 months.

AI Research: @rohanpaul_ai discussed PrefixQuant, a new quantization technique that outperforms expensive per-token dynamic quantization. @rohanpaul_ai also highlighted a paper on Prompt Caching for low-latency inference using Prompt Markup Language (PML).

AI Engineering and Development

Development Tools: @svpino expressed frustration with switching between different code editors, highlighting the ongoing challenge for developers to find the perfect tool. @awnihannun showcased the MLX back-end in LM Studio, demonstrating its performance on an M1 laptop.

AI Frameworks: @hwchase17 announced “long-term memory” support in LangGraph, allowing for persistent document storage and content-based filtering across conversational threads.

AI Evaluation: @ShreyaR shared benchmarks comparing OpenAI’s DevDay Eval product and Bespoke Labs’ Minicheck for hallucination detection, with Minicheck showing better accuracy in detecting hallucinations.

AI Infrastructure: @_philschmid introduced Hex-LLM, a new LLM serving framework designed for TPUs, offering low-cost, high-throughput deployment for open models from Hugging Face.

AI Ethics and Societal Impact

AI Safety Concerns: @mmitchell_ai emphasized the importance of men actively supporting gender equality in scientific fields, noting that women alone can only do so much, especially when they represent less than 10% of a field.

AI Governance: @bindureddy suggested that mainstream media and Hollywood want to regulate AI prematurely to protect their status as “celebrities,” viewing AI as a threat to their existence.

Memes and Humor

@DrJimFan shared a humorous “Hitchhiker’s guide to rebranding” for AI terms, mapping machine learning concepts to physics terminology.

@AravSrinivas posted an image comparing the difference between Google and Perplexity search results, highlighting the perceived superiority of Perplexity.

@jxmnop joked about the Nobel Prize in Physics being awarded to “ptrblock” for “fundamental contributions to physics,” playing on the unexpected nature of the actual award to AI researchers.

/r/LocalLlama Recap

Theme 1. Continuous Finetuning: A Novel Approach to Enhancing LLM Performance

Merging Llama 3.2 vision adapters onto 3.1 finetunes (Score: 40, Comments: 14): The post discusses merging Llama 3.2 vision adapters onto Llama 3.1 finetunes to improve capabilities, providing a sample Python code for 8B/70B -> 11B/90B merges. Key considerations include skipping vision_model and cross_attn layers, handling new hidden layers (e.g., 20 new layers for 70B->90B), and addressing 8 new embeddings in the first embed layer, with the author successfully merging a Hermes 70B lorablated model to create a 90B vision-capable model that retains ChatML features.

Im pretty happy with How my method worked out (Continuous Finetuning) Topped Open-LLM-leaderboard with 72b (Score: 150, Comments: 45): The author’s Continuous Finetuning method has topped the Open-LLM-leaderboard with a 72b model, demonstrating its effectiveness in preventing loss during AI model finetuning by combining new and previous weights. The method was applied to create Rombos-LLM-V2.5 AI models based on Qwen-2.5, which have achieved top or near-top performance across multiple leaderboard categories, as evidenced by the provided screenshots and a detailed write-up.
- The Continuous Finetuning method involves three steps: instruct fine-tuning a base model, applying the adapter to a general instructed model, and merging the resulting models. This approach can effectively add domain knowledge to AI models.
- Users expressed interest in the datasets used for training and the tools for model merging. The author recommended MergeKit for merging and provided links to MergeKit and Qwen-2.5 for further information.
- A user tested Replete-LLM-V2.5-Qwen-14b using a personal benchmark for literary creativity, finding it performed in the 1st quartile for literary form and 2nd tertile for content, demonstrating consistent performance compared to other models.

Theme 2. vLLM Outperforms llama.cpp in Distributed Inference Benchmarks

More than 70% faster distributed inference performance in the same machine: vLLM vs. llama.cpp, is it expected or can be improved? (Score: 44, Comments: 23): vLLM demonstrates a 70% faster distributed inference performance compared to llama.cpp on the same machine. This significant speed difference raises questions about whether it’s an expected outcome or if there’s potential for improvement in llama.cpp’s performance. The comparison highlights the importance of efficient inference implementations for large language models.
- vLLM’s performance advantage over llama.cpp is expected, with 70-80% faster distributed inference. Tests on a 4 x 4090 GPU workstation showed vLLM outperforming llama.cpp significantly in multi-GPU scenarios, while single-card performance was similar.
- The performance gap is attributed to vLLM’s use of hand-written CUDA kernels and OpenMP, compared to llama.cpp’s reliance on standard C++ and BLAS libraries. Developers are considering adding custom kernels to llama.cpp, balancing performance gains with maintainability.
- GPUStack, a framework supporting both vLLM and llama.cpp, was used for testing. Attempts to improve llama.cpp’s performance with the --split-mode row flag resulted in worse performance (26 tokens/sec) and uneven GPU utilization.

Theme 3. Microsoft’s Differential Transformer: A Breakthrough in LLM Attention

(New Quantization Algorithm) PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs (Score: 96, Comments: 10): PrefixQuant, a new static quantization method for LLMs, enables W4A4KV4 (4-bit weights, activations, and KV cache) inference while outperforming dynamic quantization techniques. This approach eliminates outliers and allows for efficient per-tensor static quantization of activations and KV cache, avoiding the costly per-token dynamic quantization used in previous methods to handle magnitude fluctuations across tokens.
- Users expressed interest and excitement about testing PrefixQuant, with some skepticism about its performance claims. The community is eager to see the release of inferencing kernels for practical implementation.
- Discussion arose about perplexity scores, comparing PrefixQuant to llama.cpp’s q4_K_M quantization. Users debated the comparability of results, noting differences in quantization methods and benchmarking conditions.
- Detailed analysis of llama.cpp’s codebase revealed that q4_K_M quantization uses a mix of Q4 and Q6 precision, with higher precision for certain layers. This highlights the complexity of comparing different quantization methods based solely on file sizes.

(Microsoft Research) Differential Transformer (Score: 271, Comments: 65): Microsoft Research introduced the Differential Transformer, a novel architecture that improves Large Language Model (LLM) performance by incorporating differential equations into the transformer framework. This approach allows for more efficient modeling of continuous data and achieves state-of-the-art results on various benchmarks, including language modeling and time series forecasting. The Differential Transformer demonstrates enhanced capabilities in capturing long-range dependencies and processing sequential data, potentially advancing the field of natural language processing and time-based predictions.
- The Differential Transformer uses a novel attention mechanism that calculates attention scores as the difference between two separate softmax attention maps, effectively canceling noise and promoting sparse attention patterns. This approach shows promising results in long-context modeling, hallucination mitigation, and in-context learning.
- Users expressed excitement about the potential of this architecture, particularly for small models and instruction following. Some speculated on the impact of training large models from scratch with this architecture and then distilling them into smaller models for improved accuracy and cost-effectiveness.
- The implementation is available on GitHub, including versions compatible with FlashAttention. However, new models need to be trained to benefit from this architecture, as it cannot be applied to existing weights.

Theme 4. Inflection AI Expands with New Models and Enterprise Offerings

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Breakthroughs

AI Model Releases and Improvements

Industry Developments

Expert Opinions and Predictions

AI-Generated Content and Tools

A summary of Summaries of Summaries by O1-mini

Theme 1. Advanced AI Model Performance and Optimization

SOAP Optimizer Outperforms AdamW: Users tested the SOAP optimizer on Alpaca, achieving better performance than AdamW until adjusting AdamW’s learning rate. However, SOAP lacks support for distributed training and bf16 formats.
L-Mul Algorithm Slashes Energy Costs: The L-Mul algorithm approximates floating point multiplication with integer addition, reducing energy costs by 95% while maintaining higher precision compared to 8-bit floating point operations.
Diff Transformer Enhances Attention Mechanisms: The Differential Transformer introduces a differential attention mechanism, improving long-context modeling and reducing hallucinations in tasks like question answering, outperforming traditional Transformers.

Theme 2. Infrastructure and Hardware Support for AI

Dual GPU Setup Limited by Performance: Using an RTX 3060 and RX 6600 provides 20GB VRAM but doesn’t boost speed. A second RTX 3060 may help load larger models without enhancing performance.
Apple MLX Integration in LM Studio 0.3.4: LM Studio 0.3.4 now supports Apple MLX, enabling efficient model execution on Apple Silicon Macs and allowing users to run larger models with enhanced compatibility.
External GPU Testing on Raspberry Pi 5: A user set up a GPU test rig on Raspberry Pi 5 with an AMD RX 460 and an amdgpu Linux kernel patch, aiming for 4K gaming and full external GPU support.

Theme 3. Challenges in Training and Fine-tuning AI Models

Training Vicuna-7B Faces CUDA Errors: Users encountered CUDA out of memory errors when training Vicuna-7B on Runpod, despite having 5 GPUs with 24GB RAM each. Adjusting DeepSpeed configurations resolved the issue.
Aider’s Architect Mode Requires Refinement: Users reported that Architect Mode in Aider often fails to complete tasks, necessitating prompt adjustments for better planning and observation before coding.
DeepSpeed and Accelerate Configuration Issues: Members discussed resolving DeepSpeed configuration errors by ensuring device counts align with multiples required and using correct API parameters, streamlining the training process.

Theme 4. Data Management, Security, and Scalability

Data Breach at Muah.ai Exposes 1.9M Emails: The AI girlfriend service Muah.ai suffered a data breach, exposing 1.9 million email addresses and sensitive prompts, including information related to child exploitation.
Model Merging at Scale Enhances Generalization: Research on model merging up to 64B parameters shows improved generalization and efficiency. Larger models enhance the benefits of merging, especially when combining multiple expert models.
AI Data Wall Concerns: As language models approach data limits, concerns about a data wall hindering AI progress emerge. Contrasting views suggest human reasoning can compensate for limited data exposure.

Theme 5. AI Tools, Integrations, and Community Research

Tool Integration with LangChain and Aider: Users explored integrating Livekit with LangChain for real-time capabilities and Aider for external LLM integrations, enhancing functionalities like RAG bots.
Llama Stack Unveils New Development Tools: Llama Stack tools released by Meta provide powerful resources for developers to optimize AI model capabilities, with GitHub repositories offering detailed examples and utilities.
Community Research and Nobel Prize Updates: The 2024 Nobel Prize in Chemistry awarded to David Baker, Demis Hassabis, and John M. Jumper for contributions to computational protein design and AlphaFold2. Community discussions also reflect on AI research contributions and critiques, such as Schmidhuber’s insights on attribution.

O1-preview

Theme 1. AI Model Advancements and Releases

Theme 2. AI Tools and Integration Challenges

Cline AI Assistant 2.0 Streams Responses into Your Editor: The new Cline AI Assistant 2.0 introduces features like streamed responses directly into editors and a cancel button for task management. Users note a 40% reduction in requests due to an XML-based tool-calling prompt.
Aider Struggles with File Management and External LLMs: Users reported that Aider doesn’t auto-populate new files in the list without manual commits. Attempts to integrate external models like SambaNova require manual API configurations, highlighting integration challenges.
OpenAI Realtime Console Makes Voice API Accessible: A demo repository helps users test OpenAI’s new Realtime Voice API with a simple npm start, although one user incurred $3.87 in charges for 15 minutes of use.

Theme 3. AI in Research and Recognition

Nobel Prize in Chemistry Honors Computational Innovators: The 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John M. Jumper for breakthroughs in computational protein design and protein structure prediction via AlphaFold2.
Debate Over AI Attribution in Nobel Prizes: Controversy arose as figures like Schmidhuber criticized the Nobel Committee for overlooking significant contributors in AI, sparking discussions about proper attribution in scientific achievements.
Scaling Laws Debate: Square Root vs. Fourth Root: Members debated scaling laws in AI, contrasting new proposals for square root scaling against Kaplan’s established 0.28 constant suggesting fourth-root scaling.

Theme 4. AI for Creative and Emotional Engagement

Emotional State Machines Make AI More Sentient: Developers are building AI with persistent emotional states, allowing bots to reflect user sentiments over time. This contrasts with typical bots that reset emotions after each interaction.
AI’s Role in Mental Health Support Under Scrutiny: Discussions highlighted the potential and challenges of using AI chatbots for mental health, with concerns about censorship policies limiting the AI’s ability to handle emotional nuances effectively.
Innovative Techniques Enhance AI Roleplay Experiences: Users shared methods for erotic roleplay (ERP) with AI, focusing on detailed character creation and immersive storytelling, though these practices raise ethical considerations.

Theme 5. Technical Challenges and Solutions in AI Development

LM Studio Users Grapple with Model Loading Issues: Upgrading to LM Studio 0.3.4 led to problems loading models like Llama 3.2. Switching to the Vulkan backend was suggested as a workaround.
HBM’s Performance Doesn’t Meet Expectations: Discussions revealed that HBM memory isn’t significantly reducing power consumption or costs. The bottleneck in supplying more H100s GPUs is linked to packaging requirements.
Torchao Encounters Quantization Hiccups: Integrating torchao with frameworks like ComfyUI led to operator errors, particularly on Windows. These issues highlight the complexities of quantization and compatibility in AI workflows.

Nvidia launches high-efficiency models: Nvidia introduced the Nemotron 51B, a NAS-optimized model achieving 2x throughput on a single H100 GPU while preserving accuracy. Users can test the model via NVIDIA’s API or download it from Hugging Face.
- This model release included several variants like NVLM 1.0 aimed to bolster AI capabilities.
- Meta releases improved VLMs: Meta launched its first VLMs, including CoTracker 2.1, capable of tracking 70k points on a single GPU for video motion prediction, with an accompanying paper available here.
- The updated SAM 2.1 model for image/video segmentation offers enhanced functionality for developers.
- Insights into Mira’s Decentralization: A member introduced Mira, a decentralized infrastructure making AI accessible, emphasizing its community-driven projects without crypto involvement. Despite technical potential, some users raised moral concerns regarding blockchain associations.
- The discourse illustrated a growing tension over integrating such technologies in AI development.
- Evaluating Diffusion Model Training Techniques: Members clarified that the diffusers library facilitates various diffusion models, noting Stable Diffusion XL and Flux as capable integrations.
- Discussions also covered training with Flux loras using gguf formats, despite current limitations on model support.
- Fine-tuning Whisper Model for ATC: A blog details the fine-tuning of a Whisper model on air traffic control communications, achieving an 84% performance improvement by reducing the word error rate (WER) from 94.59% to just 15.08%.

CMD-R Temperature Tweaks: Members highlighted optimal temperature settings for CMD-R, recommending 0.3 for deterministic outcomes and 0.8 for creative tasks, with concerns on generative costs.
- Suggestions included generating with 0.8 then formatting with 0.1 to balance creativity and cost.
- API Connection Hiccups: Intermittent issues with the Cohere API were reported, with one member resolving it by accessing response.message.content(0).text, causing a brief debug frenzy.
- Members speculated recent changes in the API might be a factor, sharing troubleshooting experiences and code adjustments.
- Innovative Emotional State Machine: A new emotional state machine intends to track user emotions with persistent memory, keeping assistant bots in tune with user sentiment.
- This distinct approach bucks typical bots’ flexibility, as they remain in an emotional state reflective of user interactions.
- Advanced RAG in Banking: A user detailed their experiments with an RAG solution yielding 75% recall@5, outperforming OpenAI for banking applications by embedding 2000 chunks.
- They aim to utilize this as a proof of concept for the bank, showcasing the feasibility of their solution.
- AI’s Role in Mental Health Support: Discussion turned to the use of AI chatbots in mental health contexts, highlighting their value when human therapists are absent yet noting challenges with emotional context.
- Concerns emerged around censorship policies that limit these bots’ ability to interpret complex emotional nuances, impacting their effectiveness.

Aider struggles with File Management: Users faced issues with Aider not auto-populating new files in the file list, requiring the use of /commit or specifying file paths directly to see changes.
- Another user pointed out that files must be committed to the git repository to be available in autocomplete, underlining the importance of version control.
- Integrating External LLMs is a Challenge: Community members discussed the difficulty of integrating SambaNova models with Aider, suggesting manual API configuration for OpenAI-compatible endpoints.
- Further inquiries revealed methods for adding model pricing and token costs through metadata JSON files, yet some configurations still posed issues.
- Architect Mode needs Refinement: Concerns emerged regarding Aider’s Architect mode which often fails to complete tasks fully, necessitating user intervention to continue.
- Users suggested modifying prompts for better planning and observation before coding to enhance the effectiveness of this mode.
- OpenAI Realtime Console makes voice API accessible: A demo repository for the OpenAI Realtime Console was successfully set up, simplifying access to the new voice API announced at DevDay.
- While interacting via voice incurs costs, one user noted charges of $3.87 for 15 minutes of use, which raised concerns about testing expenses.
- Cline AI Assistant 2.0 breaks new ground: The newly released Cline AI Assistant 2.0 boasts features like streamed responses directly into the editor and a cancel button for task management, enhancing usability.
- Users highlighted the XML-based tool calling prompt, which reportedly reduces requests by 40%, making resource use more efficient.

Nobel Prize in Chemistry Celebrates Computational Advances: The 2024 Nobel Prize in Chemistry has been awarded to David Baker for computational protein design and jointly to Demis Hassabis and John M. Jumper for protein structure prediction as announced on Nobel Prize Tweet.
- Members celebrated this milestone but expressed skepticism about its implications for future innovations in AI.
- PRMs Under Scrutiny Amid Development Changes: A lack of research on PRMs was humorously noted, with members pointing out that ‘almost none on PRMs, almost a billion as LLM as a judge’.
- Concerns emerged regarding the patenting process in ML, with suggestions that companies often file defensively, leading to vague claims and unresolved disputes.
- Schmidhuber Takes Aim at AI Attribution Issues: Criticism arose concerning the Nobel Prize in Physics 2024, where Schmidhuber highlighted plagiarism and misattribution in works by Hinton and collaborators, claiming significant contributions were overlooked.
- The mix of sentiments reflected a community reaction to the historical significance of AI contributions, as highlighted by user comments about Schmidhuber’s critique.
- ButtBench Alignment Project Gets a Logo: The ButtBench Alignment Project designed a new logo, marking a visual identity for a project that has reached SOTA, though still far from human performance as noted by Luca Soldaini.
- This move signals a push for recognition and clarity in the goals of the project, resonating well with the community.
- Data Wall Looms in AI Development: A data wall threatens progress in language models as current offerings nearing data limits were discussed, raising questions about reliance on larger data volumes.
- Contrasting opinions suggest human performance is not solely dependent on extensive data exposure, hinting at a philosophical divide on AI efficiency.

Profit Model Queries at Perplexity AI: Concerns regarding how Perplexity AI generates profit arose, particularly with student discounts in play, making the business model appear precarious.
- sneakyf1shy humorously suggested that venture capital might be the backbone of their operations, hinting at potential long-term uncertainties.
- Complexity Extension Packs a Punch: The newly launched Complexity extension is enhancing the Perplexity experience with options for customizable themes and markdown exports, leading some to say it’s ‘like Perplexity on steroids.’
- Feline and asura0_00 praised the extension for significantly boosting user interactivity.
- Perplexity AI Shortens Responses: Users noticed a trend toward more condensed responses from Perplexity AI, raising concerns that answers may lack information depth.
- Speculation suggests these changes could be tied to adjustments in token limits, affecting the quality of responses.
- Meta’s Movie Maker Rocks: Meta has launched a movie generation tool, enabling users to create short films using AI, which aims to enhance storytelling.
- This development showcases the potential of AI in creative domains.
- Frustrations with Citation API Access: Members raised concerns regarding unanswered requests for whitelisting on the citation API, highlighting multiple attempts via various channels with no feedback.
- A growing sense of frustration is evident among users awaiting updates.

ControlNet Models Simplified: A member shared a GitHub link regarding ControlNet models, suggesting users focus on practical examples while skimming the mathematical explanations.
- Scroll a bit down, ignore the math and look at the examples.
- Flux Inpainting’s Fast Track: In discussions about Flux and Schnell inpainting models, one member noted that using recommended settings should reduce processing time to 1-2 minutes, compared to an experienced 25 minutes.
- The community highlighted key differences in iterations that affect Flux dev and Schnell performance.
- Craving Kaggle Notebooks for Image Generation: A call for resources in the form of a Kaggle notebook for Automatic1111 broke out, shedding light on the community’s demand for structured guides.
- Members reflected on the difficulties of locating specific notebooks for seamless image generation processes.
- Distilled CFG Confuses the Masses: Discussions on the nature of distilled CFG clarified that it serves as guidance distinct from the standard CFG, arising from specific model training.
- Community members expressed that while Flux dev enhances CFG usage, it currently does not support negative prompts.
- Deforum After Colab Restrictions: A Plan: Inquiries about utilizing Deforum post-Colab restrictions prompted discussions on alternatives for accessing computing power, particularly renting GPUs.
- Suggestions included using RunPod for GPU rental as a feasible solution.

Nobel Prizes Ignite AI and Chemistry Debate: Recent discussions highlighted the Nobel Prize awards’ relevance for AI figures such as Hinton and Hopfield, questioning their impact on traditional physics and chemistry fields.
- Opinions were split; while some feared a dilution of the award’s prestige, others argued that innovation and enthusiasm should drive selection.
- PhD Candidates Push Back on Publication Metrics: Frustration emerged over the pressure from publication metrics in PhD programs, which some believed created a daunting competitive environment.
- Members proposed that effective networking might be a better strategy for securing mentorship and collaborations, rather than just chasing publication counts.
- Web3 to Web5 Transition Confuses: Debate arose on moving from Web3 to Web5, likening the naming strategy to the Fibonacci sequence, leading to speculation about future iterations like Web8.
- Conversations turned humorous with members joking about the absurdity of the progression.
- Scaling Laws Debate Engulfs Members: One member shared an overview stating that cross-entropy loss decreases with quadratic compute increase, referencing an article that proposes square root scaling.
- This was contested with Kaplan’s laws suggesting a constant of 0.28, advocating for a fourth-root scaling approach.
- Spotlight on 0-shot COT Models: A focus emerged on the widespread adoption of 0-shot COT variants in recent model releases, hinting at a shift in evaluation methodologies.
- While members pondered potential evaluation implementation details, no specific techniques were mentioned.

HBM’s Performance Compared to Expectations: Concerns were raised regarding HBM not performing better than expected, still representing a HUGE cost in products like the H100 while not significantly reducing power consumption.
- The key bottleneck in supplying more H100s was identified as required packaging.
- GPT2 Training Encounters TypeError: A member reported a TypeError while running GPT2 training related to the normal_() function in PyTorch 2.0.0 due to an unexpected keyword argument ‘generator’.
- Discussion suggested understanding complexities of training, including initialization and forward/backward passes.
- Seeking Libraries for WebGPU Testing: A community member seeks recommendations on libraries for testing WebGPU, currently using Vitest and Playwright but facing flaky test runs.
- They suspect the issue might stem from Playwright not properly clearing resources between test runs.
- Gearing Up Raspberry Pi 5 for 4K Gaming: After witnessing Pineboards’ 4K demo, a member decided to set up a GPU test rig on Raspberry Pi 5 with the amdgpu Linux kernel patch.
- They aim for full external GPU support and shared insights on how to apply the patch.
- Launch of FusedLinearJSD: The recent pull request introduced FusedLinearJSD, enabling efficient handling of the final linear layer by avoiding large logits tensor materialization.
- This optimizes both the forward and backward pass for improved execution, mirroring the fuse linear CE approach.

Choosing Between ChatGPT and Claude Subscriptions: A member advised against subscribing to ChatGPT for features in preview due to usage caps, although access to GPT-4 legacy and 4o models might be beneficial.
- They stressed that subscriptions should allow full functionality rather than limiting preview access.
- Understanding O1 vs. O1 Mini Models: Members compared the O1 models, which act as ‘reasoners’, to 4o, highlighting the O1’s limited availability of 50 uses per day versus 80 uses for 4o within 3 hours.
- The discussion included plans for A/B testing between the two models to determine performance differences.
- Theoretical Exploration of AI Evolution: A theory on AI consciousness evolution was entertained, emphasizing re-training and fine-tuning for advancement in capabilities.
- Conversations swirled around the commercial viability of these evolved AI models and potential business models to support them.
- User quits ChatGPT over rewriting responses: A user expressed frustration with ChatGPT‘s habit of rewriting responses, causing them to stop using it for several months.
- They noted the exacerbating headaches from the rewriting issue, which continued even when they requested it to stop.
- Possible solutions discussed for ChatGPT: Another member suggested that the rewriting behavior might relate to Canvas or DALL-E prompts, and provided a workaround for DALL-E use.
- They recommended the phrasing ‘Make an image using these exact words: (your words)’ to avoid the rewriting problem.

Kainan offers free compute resources: Kainan expressed willingness to provide free compute resources for a competition, sparking interest from members.
- Though there was enthusiasm, some uncertainty arose regarding how many participants would actually utilize this offer.
- 2024 Nobel Prize awarded for Protein Research: The Royal Swedish Academy of Sciences awarded the 2024 #NobelPrize in Chemistry to David Baker and Demis Hassabis & John M. Jumper for their contributions to computational protein design and structure prediction, as reported here.
- This recognition underscores the pivotal advancements in protein research within the AI community.
- LM Studio boosts performance with Apple MLX: The new LM Studio 0.3.4 is out, featuring support for Apple MLX, allowing efficient model execution on Apple Silicon Macs.
- Users are thrilled by the improvements in running larger models and the potential capabilities provided by MLX.
- LLM360 launches massive pre-training dataset: LLM360’s new dataset boasts 15 trillion tokens, ensuring rigorous data quality through thorough filtering techniques.
- This initiative focuses on enhancing the training quality for LLMs, emphasizing deduplication and superior dataset structuring.
- Llama Stack reveals new development tools: A member highlighted the new Llama Stack tools released by Meta, finding them pretty powerful.
- This showcases an emerging interest within the community for utilizing advanced tools to optimize AI model capabilities.

Prompt Caching: The Good and the Bad: Members discussed the mechanics of prompt caching, noting it can be problematic for changing contexts or short prompts. One member remarked, ‘You cannot disable prompt caching for those providers who do automatic prompt caching,’ pointing out critical limitations.
- This sparked a debate on when and how to effectively utilize prompt caching without compromising performance.
- Curiosity about Inflection 3.0: The anticipated launch of Inflection 3.0 has generated buzz, particularly regarding its integration with Intel Gaudi 3 for better performance. Despite the excitement, some members expressed skepticism about the lack of concrete benchmark data.
- Concerns were raised that the hype might overshadow the actual performance improvements and real-world applications.
- Understanding OpenRouter API Rate Limits: Clarifications on OpenRouter API limits reveal they are dynamic and depend on account credits. One member shared a GET request example demonstrating how to check rate limit status and credits associated with an API key.
- This guidance is crucial for optimizing API usage while ensuring compliance with request limits.
- NotebookLM Podcast Gains Traction: Participants shared positive feedback on the NotebookLM Deep Dive podcast and highlighted its utility during commutes by creating accompanying notebooks. One user noted a desire for automation tools like ai-podcast-maker, stating, ‘automation ftw.’
- This discussion underscores the growing trend of integrating audio content into daily workflows for enhanced learning.
- Gemini Moderation Worries Surface: Concerns arose about Gemini potentially moderating inputs, raising fears of user bans over specific content. This initiated a broader dialogue on user experience and content moderation policies within AI frameworks.
- Participants emphasized the need for transparency in moderation practices to ensure positive engagement from users.

LlamaIndex Workflows Tutorial Brilliance: A detailed tutorial illustrates how to implement Workflows in LlamaIndex, contrasting it with LangGraph and aiding in the creation of AI research agents.
- It includes practical debugging and optimization tips, ensuring a smoother implementation experience.
- LlamaCloud’s Financial Data Superpower: In a recent demo, the team showcased how to utilize LlamaCloud and LlamaParse to automate the filling of financial spreadsheets across multiple companies.
- This highlights the substantial contribution of LLMs in streamlining data handling and analysis processes.
- SFTechWeek Meetup on Multi-Agent Workflows: A reminder to RSVP for the in-person gathering at LlamaIndex HQ during #SFTechWeek, focusing on implementing Multi-Agent workflows in real production environments.
- Participants are promised insights on RAG systems and production challenges, alongside food and networking opportunities. RSVP here.
- Build Your Own AI Agent with OpenAI: A demonstration by the team allowed users to interact with an AI agent in real-time using the OpenAI Realtime API client, showcasing voice interaction capabilities.
- This open-source tool opens doors for developers to create personalized voice agents seamlessly, with examples provided for ease.
- Semantic Chunking Conundrum in TypeScript: A user sought guidance on implementing semantic chunking in TypeScript, referencing a comparable example in Python for context.
- They expressed frustrations with the lack of available resources and sparked discussions for community solutions.

DOM Data Attributes Enhance HTML Elements: A DOM feature now allows data storage on elements with custom attributes starting with data-myattribute, improving data handling in HTML.
- This development encourages innovative techniques for data manipulation directly via the DOM.
- WebAssembly Component Model Repository Launched: The repository for the WebAssembly Component Model has been shared, detailing its design and specifications.
- It provides essential insights for developers interested in the component model aspects of WebAssembly.
- Mojo’s GPU Support Sparks Excitement: Anticipation builds around the upcoming GPU support in Mojo, promising enhanced performance capabilities.
- Community members are exploring integrating PyTorch with Mojo to optimize usage of GPU resources.
- Mojmelo Brings Scikit-learn to Mojo: The Mojmelo project aims to implement machine learning algorithms in pure Mojo, providing an alternative to Cython dependencies in Scikit-learn.
- This initiative may significantly streamline the process of running Scikit-learn workflows through Mojo functionality.
- Mojo Graph Performance Concerns: Performance tests highlighted that total compile times for graphs were 0.312s and 0.451s, leading to concerns about slower debugging processes.
- Suggestions to reuse the inference session could mitigate these compile time issues, addressing potential performance penalties from using List types.

Lab Assignments Officially Released: The lab assignments for the course are now live, with the first task focused on using the Autogen framework to analyze restaurant reviews, due December 12, 11:59pm PST.
- Subsequent labs will address prompt engineering for LLM security, emphasizing creating attack and defense prompts.
- Sign Up for Course Made Simple: Prospective students can easily join the course by filling out this form.
- Engagement is encouraged in the LLM Agents Discord for further collaboration.
- Lab 1 Download Issues Reported: Users encountered problems downloading Lab 1 instructions, receiving empty files, while other labs function correctly.
- It was pointed out that the file is accessible on Google Drive despite having no preview.
- Reinforcement Learning’s Impact on AGI Debated: Concerns arose regarding the relevance of Reinforcement Learning (TD learning) in achieving AGI, with some questioning if agents can thrive without it.
- The discussion highlighted RL’s role and efficacy in modern AI architectures.
- Call for Collaborative Learning: Members encouraged peer collaboration for brainstorming while tackling assignments, aiming for a shared learning experience.
- This encouragement is seen as a way to foster camaraderie and improve understanding of complex LLM concepts.

Exploring Memcached Support in LangChain: A member is investigating whether adding support for pymemcache in LangChain is enough, or if a broader range of clients like python-memcached or pylibmc would be beneficial.
- The goal is to improve caching flexibility within LangChain, making it more adaptable to different caching needs.
- LiteLLM’s Streaming and Caching Issues: Concerns arose about LiteLLM not retrieving cached tokens while streaming, leading to a query about best practices for ensuring effective caching.
- Resources on LiteLLM were shared, suggesting that token stream responses may disrupt caching mechanisms.
- SQL Query Limitations in AI: A user raised issues regarding limiting SQL queries to specific IDs without relying on LLM instructions, looking for stricter query generation methods.
- Another member recommended using grouping by ID to improve filtering and achieve more reliable results.
- SQL Chain Compatibility with Other Models: A question was proposed regarding the performance of the SQL chain with models outside of GPT 3.5, which often return inaccurate results.
- One member found success using 4o-mini by focusing on precise column naming and careful question formulation.
- Integrating Livekit for Real-time LangChain Functions: Interest was expressed in integrating Livekit with LangChain to enhance its real-time capabilities for advanced applications.
- The member specifically mentioned plans to develop a RAG bot, showcasing their ambitions for progressive application development.

Get Ready for Mozilla AI Talk!: Next week, we’re excited to host a talk from a member of Mozilla AI discussing intriguing open source initiatives. Don’t miss out on this opportunity to learn more!
- You can join the event here to catch the insights.
- Confusion Over –stdin Flag: A user expressed confusion on how to use the –stdin flag and mentioned they couldn’t find guidance in the docs, highlighting a documentation clarity gap.
- Further clarification is needed to assist users in utilizing this feature effectively.
- LLMs Stay Deterministic with Same Seed: A discussion revealed that LLMs can be deterministic if the same seed and input are used, contrary to popular belief. ChatGPT randomizes the seed on each request to introduce non-determinism.
- It’s crucial to note that using the same inputs and setting temperature to 0 should yield consistent results.
- Unpredictability with Model Updates: Concerns were raised about model updates in ChatGPT possibly affecting result consistency over time. Changes in the model could lead to variations that disrupt previously deterministic behavior.
- Users emphasized that updates might introduce unpredictability even when the code remains static.
- Code Outcome Variability Across Systems: A member pointed out that updates to systems or Python could influence code behavior, resulting in variable outcomes. For instance, accessing user tokens could alter the execution path.
- This variability underscores the importance of a controlled environment for consistent results.

Clang Backend Errors in Tinygrad: A user encountered an error using exo on Linux with the clang backend, including a lowering error with MetaOps.KERNEL that replicates across two systems, possibly linked to Nix package issues.
- Additionally, running TINYGRAD_DEBUG=2 logged hundreds of operations before crashing, revealing detailed activity without immediate failure.
- Introducing Fashion MNIST for Tinygrad Learners: A member proposed a Pull Request to add Fashion MNIST as a new dataset, bridging complexity between MNIST and CIFAR-10 for drivers of tinygrad education.
- This initiative reflects an eagerness in the community to augment learning resources, prompting discussions about more datasets to further enrich training experiences.
- Expansion of Dataset Options for Learning: Members have expressed interest in adding more datasets to tinygrad, indicating a collaborative effort to boost learning opportunities beyond existing options.
- The call for new datasets promises to create a more diverse learning environment, allowing users to experiment with various data types and challenges.

Hierarchical Generation Gains Traction: A member shared a blog post on Coupling Generation and Compression, discussing a framework for Hierarchical Generation similar to Stable Cascade models.
- The article highlights the prevalent model paradigm where a decomposer is trained first, which notably affects LLMs and image generation outputs.
- o1-preview Set to Redefine Zero-shot Capabilities: o1-preview exhibits significant strengths in zero-shot (weak) out-of-distribution generalization, outperforming previous models as per preliminary findings.
- o1-mini shows no such advancement, matching previous SOTA, which clearly illustrates the value of pre-training scale in model efficacy.
- TruthfulQA Shows o1’s Comprehension Skills: o1 posted strong results on TruthfulQA, particularly in grasping common misconceptions effectively, indicating potential in comprehension tasks.
- Despite its constraints, the performance demonstrates o1’s ability to tackle certain understanding challenges with notable success.

Fetching Random Cat Images Made Easy: A new feature demonstrated the ability to fetch random cat images using The Cat API. This implementation involves creating a Cat model and utilizing an HTTP client for seamless image retrieval.
- The demo emphasizes simplicity, allowing developers to easily integrate cat images into their applications.
- Limiting Cat Breeds Fetching: A showcased method allows users to fetch cat breeds while restricting the number of breeds returned. Code snippets reveal that only a limited set of breeds is retrieved and can be structured into a CatBreed model for efficient access.
- This enhancement provides developers with tighter control over data retrieval, making it easier to handle large datasets.
- Video Demos for Visual Learners: Links to demonstration videos were shared, providing visuals on the functionality of the cat image and breed fetching features. These guides clarify implementation processes for users.
- Such resources empower developers to grasp the tools effectively and implement them with confidence.

Whisper Turbo German Model Halves Error Rate: The newly introduced Whisper Turbo German model reduces error rates by 50% in various benchmarks compared to earlier versions, according to a source. This model is optimized for transcription, voice commands, and automatic subtitling specifically for German.
- It enhances usability in diverse scenarios by providing dictation functions for word processing software, making it a valuable tool for developers working with German-language processing.
- Applications of Whisper Turbo Model: Key applications of the Whisper Turbo German model include effective transcription of spoken German, automatic subtitling, and facilitating voice-based search queries.
- Developers can leverage these functionalities for various projects, improving accessibility and interaction in German-speaking environments.

Writer’s Palmyra-X-004 Model Update Request: Sam Julien from Writer requested the Palmyra-X-004 model be added to the leaderboard following an email from CTO Waseem AlShikh, showcasing their impressive results in internal benchmarks.
- Do we need to submit a PR? highlights their commitment to community engagement.
- Clarifying Leaderboard Submission Process: Sam also sought clarification about whether a PR is required for the Palmyra-X-004 model’s leaderboard addition.
- This inquiry reflects a structured approach to ensure their advancements are recognized effectively within the community.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

LM Studio ▷ #general (204 messages🔥🔥):

Llama 3.2 Inquiry

MLX Model Issues

Model Accessibility

New Features in LM Studio 0.3.4

Quantization Model Concerns

Links mentioned:

LM Studio ▷ #hardware-discussion (30 messages🔥):

Using dual GPUs

Performance of RTX 3060 and RX 6600

R9 7900X CPU performance

AVX2 support in VMs

Difference in GPU architectures

Dual GPU setup still limited by performance: Members discussed using both an RTX 3060 and RX 6600 together for a total of 20GB VRAM, while noting that they don’t perform well together, especially in terms of speed.
- One member suggested that while a dual setup can load larger models, the speed will remain essentially the same due to the performance limits of the 6600.
- Best choices for increased VRAM: The conversation highlighted that adding a second RTX 3060 would help with loading larger models but wouldn’t increase speed, echoing the sentiment of needing more VRAM for accuracy.
- One user plans to save for a more powerful GPU, specifically the RTX 3090, acknowledging that their speed at 9-10 tok/sec is manageable for now.
- Running models on CPU-only setups: A query arose about running models on a CPU-only Ubuntu VM, with members confirming that using a CPU with AVX2 instructions is necessary.
- However, they cautioned that it might be slow and advised trying it out since some software is free.
- NVIDIA’s shift from NVLink: Discussion revealed that the NVIDIA RTX 4000 series does not support NVLink, moving to PCIe Gen 5 instead for multi-GPU setups.
- This change sparked interest regarding the performance capabilities of unconnected GPUs, with users expressing surprise at their speed.

Unsloth AI (Daniel Han) ▷ #general (200 messages🔥🔥):

Model Merging at Scale

Fine-tuning Qwen 2.5

Unsloth AI and Dataset Formats

Instruct vs. Base Models

Hugging Face Datasets

New insights on Model Merging at Scale: Exciting new work on large-scale model merging was shared by @Prateek Yadav, addressing questions about performance when combining larger models up to 64B parameters. This research evaluates how model size, quality, and methods affect performance and generalization.
- The related paper can be found on arXiv, detailing systematic evaluations and findings.
- Fine-tuning Qwen 2.5 is now smooth sailing: @theyruinedelise confirmed that fine-tuning on Qwen 2.5 models is now problem-free after previous prompt issues had been addressed. A collection of available Qwen 2.5 models can be found on Hugging Face.
- This reassures users interested in leveraging these models for their tuning tasks.
- Understanding Dataset Formats for Unsloth: Discussion emphasized that while CSV files can be used for datasets, using formats like Parquet with Hugging Face’s default datasets could be more efficient. Users were reminded to ensure their dataset structure aligns with expected column formats.
- For example, columns named ‘train’ and ‘conversations’ may be specified for clarity.
- Distinguishing Between Instruct and Base Models: Users clarified that instruct models are specifically tuned to respond to direct prompts, incorporating refinements for answering questions, unlike base models that mainly focus on outputting the next token. This distinction allows for targeted applications in different scenarios.
- Instruct models also potentially include alignment bias, which could impact their responses.
- Exploring Conversion Tools for Datasets: There was a suggestion to convert datasets into formats that are better supported within Hugging Face, with recommendations to either write custom scripts or use existing functions. This ensures that datasets are uploaded correctly for intended training purposes.
- Using the load_dataset('csv') function can help facilitate this process, making it more accessible for users.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #help (19 messages🔥):

Colab gguf file download struggles

Using logits with Ollama Llama

Continued pretraining of Llama 3.2 3b

AMD GPU limitations with Unsloth

Fine-tuning with Unsloth FastLanguageModel

HuggingFace ▷ #announcements (1 messages):

Nvidia models

Meta's VLMs

Hugging Face Accelerate 1.0

ColPali multimodal retrieval

Paper Central

Nvidia launches high-efficiency models: Nvidia introduced the Nemotron 51B, a NAS-optimized model achieving 2x throughput on a single H100 GPU while preserving accuracy. Users can experiment with the model through NVIDIA’s API or download it from Hugging Face.
- This model is accompanied by several others, including NVLM 1.0 and OpenMath aimed at enhancing AI capabilities.
- Meta releases improved VLMs: Meta launched its first VLMs, including CoTracker 2.1, capable of tracking 70k points on a single GPU for video motion prediction. An accompanying paper is available here.
- The SAM 2.1 model for image/video segmentation also received an update, enhancing its utility for developers.
- Hugging Face Accelerate 1.0 launched: Hugging Face announced the release of Accelerate 1.0, featuring several new functionalities for seamless AI development. This update was well-received, prompting users to explore its improvements.
- For a detailed overview, an announcement blog is available here.
- ColPali: New retrieval approach: ColPali introduces an innovative method for multimodal document retrieval, despite some reservations about its practicality. The integration with Qdrant allows for efficient indexing and searching of embeddings.
- The related blog post provides insights on how to effectively use ColPali with existing vector databases.
- Paper Central for research updates: Hugging Face unveiled Paper Central, a space designed to compile the latest research papers. It aggregates sources like arXiv and GitHub to keep researchers informed.
- This initiative aims to streamline access to crucial academic resources, enhancing the research community’s productivity.

Links mentioned:

Tweet from NVIDIA AI Developer (@NVIDIAAIDev),): 👀 Experience high-efficiency NVIDIA Llama-3.1-Nemotron-51B – a NAS-optimized model achieving 2x throughput while preserving accuracy runs on a single H100 GPU. ✨Try out the Llama-3.1-Nemotron-51B N…
Tweet from Niels Rogge (@NielsRogge)): Meta has released CoTracker 2.1, an improved version of its Transformer-based model for video motion prediction, on @huggingface! Capable of tracking 70k points jointly on a single GPU Paper (with l…
Tweet from Tris Warkentin (@triswarkentin)): Gemma 2 just got even better! 🚀 New Japanese-tuned 2B model AND a $150K Kaggle competition to build Gemma models for every language. Great to have @sundarpichai here to share the excitement! Read m…
Tweet from Zach Mueller (@TheZachMueller)): The day has finally arrived, @huggingface Accelerate 1.0 is now out! There are tons of new goodies to explore and plenty more to come. I’ll quickly talk about my favorites 🧵 For a refresher, g…
Tweet from merve (@mervenoyann)): Your LLM can’t understand videos and images? How sad 😔 Luckily we shipped a new task for video language models 🤗 look for video-text-to-text in left tab at @huggingface /models ⏯️ It also comes…
Tweet from Adina Yakup (@AdinaYakup)): Here is a collection for leaderboards and Arenas from the Chinese community on @huggingface 🔥🏆🇨🇳 https://huggingface.co/collections/zh-ai-community/leaderboards-and-arenas-664b6913bfd9b93ba4ac242…
Tweet from Julian Bilcke (@flngr)): How it looks like right now (I’m the only user of the server so it’s smooth 😂)
Tweet from Daniel van Strien (@vanstriendaniel),): ColPali is an exciting new approach to multimodal document retrieval, but some doubt its practical use with existing vector DBs. It turns out it’s super easy to use @qdrant_engine to index and se…
Tweet from JB Delbrouck (@IAMJBDEL),): Paper Central is a new 🤗 Hugging Face space designed to provide the most up-to-date information on the latest research papers. It’s the first portal to bring together all key sources in one place…

HuggingFace ▷ #general (134 messages🔥🔥):

Model Performance Comparison

Mira Network Discussion

Model Specificity in Use Cases

TensorFlow Issues

Python Community Q&A

Comparing AI Models for Coding Tasks: Users discussed their experiences with different AI models, noting that Claude Sonnet 3.5 performed significantly better than GPT o1 preview for generating Rust code with fewer prompts.
- One user shared their strategy of using both Claude and GPT effectively to maximize outcomes when debugging code.
- Insight into Mira’s Decentralization: A member introduced Mira, a decentralized infrastructure aimed at making AI accessible, highlighting its focus on community-driven projects without crypto tokens.
- Despite its technological promise, another user expressed moral concerns regarding blockchain and cryptocurrency associations.
- Need for Clear Model Usage Guidelines: One user questioned the lack of clarity in model cards about specific applications for various AI models, such as architecture and structural engineering tasks.
- Members noted that detailed model cards often depend on the authors’ efforts and expertise in outlining effective use cases.
- Concerns with TensorFlow on GPU: Several users vented frustrations about TensorFlow’s performance on GPUs, reporting bugs related to tensor initialization issues that hindered their work.
- Recommendations were made to explore alternatives or troubleshoot the underlying errors to improve functionality.
- Engagement in Python and Data Science Discussions: The channel allowed for a variety of questions around Python, with users exploring topics like workflow automation and structured data extraction.
- Overall, the dialogue reflected a blend of technical inquiries and community troubleshooting among peers.

Links mentioned:

HuggingFace ▷ #today-im-learning (9 messages🔥):

Hierarchical Generation

Image Autoencoder Integration

Differences in Model Types

Hugging Face Metrics Implementation

Hierarchical Generation Insights: A member shared a blog post discussing the hierarchical generation paradigm, emphasizing the roles of decomposers and generators in model training.
- They highlighted the importance of compression in generative models, particularly noting how this paradigm applies to both LLMs and image generators.
- Leveraging Image Autoencoders: Discussion emerged around utilizing an image autoencoder for downstream latent spaces as outlined in the hierarchical generation article.
- In response, the article’s author explained that the encoder functions similarly to a VAE, trained to produce useful latents for a mini diffusion model.
- Exploring Model Types and Datasets: One member expressed interest in understanding the distinctions between base and instruct models as well as datasets suitable for LoRA fine-tuning.
- This illustrates a growing focus on model customization and training data relevance in the community.
- Evaluating Fine-Tuned Models with Hugging Face: Another member shared their learning process integrating Hugging Face metrics such as ROUGE and BertScore into their training pipeline to enhance model evaluation.
- The goal is to move away from other libraries for a more tailored approach in assessing fine-tuned models.

Link mentioned: coupling generation and compression: no description found

HuggingFace ▷ #cool-finds (1 messages):

Scade tools

Comfy-Flux integration

Custom image generations

Experimenting with Scade’s Custom Image Tools: A member shared their experience using Scade to create custom tools including a background remover, hand restoration, and an upscaler for images. These tools can be imported directly from the provided Drive link.
- The biggest advantage is that building these tools on Scade is cheap, and the Comfy-Flux integration greatly enhances their quality compared to creating tools from scratch.
- Sharing and Feedback on Custom Tools: The user encourages the community to try out the mentioned tools and provide feedback, expressing hope for suggestions to improve them. They also highlighted sharing these developments on the Scade community for wider visibility.
- The member emphasized that using these tools effectively can enhance image generation quality while maintaining ease of use.

Links mentioned:

HuggingFace ▷ #i-made-this (4 messages):

VividNode Updates

Burnout in Tech Creators

Fine-tuning Whisper for ATC

FluxBooru-CFG3.5

VividNode v1.4.0 Introduced: The release of VividNode v1.4.0 includes support for gpt4free allowing users to manually select providers and models, enhancing user flexibility.
- Despite its capabilities, gpt4free faces challenges such as token limits and the advantages of offline LLM usage remain salient.
- Tech Creators Face Burnout: A tech creator expressed feelings of burnout from balancing work and side projects, highlighting the struggle to keep pace with rapid advancements.
- They plan to recruit contributors post v1.5.0 release, acknowledging that support is often only offered when actively sought.
- Fine-tuning Whisper Model Boosts Performance: A blog post was published detailing the fine-tuning of a Whisper model on pilot-air traffic control communications, yielding an 84% relative performance improvement.
- This process reduced the word error rate (WER) from 94.59% to just 15.08%, showcasing the impact of tailored ASR solutions.
- Resources for Fine-tuning Whisper: The models and datasets used for fine-tuning Whisper are now shared on Hugging Face, including a GitHub repository and the dataset.
- Links to the blog post and the Hugging Face models are also provided for further exploration.
- FluxBooru-CFG3.5 Released: A link to the FluxBooru-CFG3.5 space on Hugging Face was shared, indicating recent developments in this area.
- Details about its features and applications were not elaborated upon in the message.

Link mentioned: Release v1.4.0 · yjg30737/pyqt-openai: What’s Changed Add is_g4f and g4f_platform fields to message table, remove old text Fix problems related to recent update, rename file Move every function in globals.py to utils.py for better org…

HuggingFace ▷ #NLP (8 messages🔥):

ONNX conversion of T5 models

Exploratory analysis of legal documents

Big data technologies discussion

Validation of LLM outputs

Server setup for Hugging Face pipelines

T5 ONNX files explore: A member pointed out that the required ONNX files for the T5 model can be found under the ONNX folder on the Hugging Face page, suggesting a download if needed.
- They also shared a link on different ways to convert Transformers models to ONNX including a specific example with distilbert-base-uncased-finetuned-sst-2-english.
- Exchange ideas on legal docs: One member sought to engage with others who have experience in exploratory analysis of legal documents, expressing a desire to exchange ideas and problems.
- No specific responses were noted, indicating potential interest in the topic but limited engagement.
- Big Data technologies inquiry: A member reached out to see if anyone was well-versed in Big Data technologies, particularly Kafka and Hadoop.
- This inquiry highlights a potential interest in leveraging these technologies in their projects.
- Validating unknown LLM outputs: A member requested techniques for validating unknown outputs from LLMs as JSON, aiming for validation and cleaning in Python and JavaScript.
- Another member recommended the json schema library which they have used with varying success.
- Efficient server setup for Hugging Face pipelines: A member shared their experience of using Triton Inference Server for loading Hugging Face pipelines but expressed concerns about over-engineering without a GPU.
- They are exploring alternatives for setting up a server with 3-4 models that processes HTTP requests without needing Docker containers for each model.

Links mentioned:

HuggingFace ▷ #diffusion-discussions (8 messages🔥):

Image Quality in Diffusion Models

Flux Loras and GGUF Training

Training Diffusion Models with Diffusers

Assessing Image Quality in Diffusion Models: Members discussed the low resolution of a particular image, suggesting it could be produced by Flux or a pony model with a griffin lora, but noted it appeared to be post processed.
- It was highlighted that the image could depict any random person due to its generic nature and lack of detail.
- Clarifying Diffusers and Model Types: A member clarified that diffusers is a library enabling the use of various diffusion models, specifically noting Stable Diffusion XL and Flux as capable models.
- This generative flexibility allows for the integration of models with the diffusers library.
- Training Flux Loras on GGUF Formats: A member inquired about training Flux loras and finetunes using flux gguf formats, leading to a mention that gguf is not yet supported but training with 6GB GPUs is possible using Kohya.
- There are suggestions that gguf provides more accuracy compared to fp16, but there isn’t sufficient comparison data for int4 versions yet.

Cohere ▷ #discussions (38 messages🔥):

Temperature settings for CMD-R

JSON schema discussions

Introduction of new members

Nobel Prize speculation

HumanEval and QA processes

Link mentioned: Tweet from Omar Sanseviero (@osanseviero): BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Literature to the Attention Is All You Need authors. Their work has made thousands cry, laugh, or ric…

Cohere ▷ #questions (36 messages🔥):

Cohere API Issues

System Role Formatting

ETL Pipeline for RAG

Zero Data Retention for LLMs

Cohere API connection issues: A member reported intermittent connection issues with the Cohere API, receiving an error indicating that the ‘ChatResponse’ object has no attribute ‘text’. After troubleshooting, they discovered that using response.message.content(0).text resolved the problem.
- Members shared updates and tests on troubleshooting code, suggesting recent API updates might have contributed to confusion.
- Formatting System Role in Markdown: A member inquired about the language structure necessary for shaping the system role, to which it was confirmed that formatting the task and context as markdown yields better results. Documentation was provided for further guidance on constructing effective system messages.
- A sample system message structure was mentioned, detailing how concise instructions can guide the model’s behavior efficiently.
- Exploring ETL Solutions for RAG: A user introduced their capstone project focused on developing an ETL pipeline for unstructured data processing aimed at retrieval-augmented generation (RAG). They sought community insights and experiences related to this technology.
- Community members pointed out the availability of numerous use cases and blogs on Cohere’s website, as well as expressed interest in hearing individual experiences with similar projects.
- Zero data retention for enterprise users: A user expressed concerns about customer data retention policies, particularly regarding LLMs storing prompts for longer periods. They were informed that zero data retention options exist for enterprise customers under certain usage commitments.
- Clarification was provided about the conditions under which Cohere can offer this option, linking it to enterprise agreements.

Links mentioned:

Cohere ▷ #api-discussions (46 messages🔥):

Cohere API usage

RAG solution for banks

Embedding model performance

Trial key limitations

Cohere ▷ #projects (29 messages🔥):

Emotional State Machine

Emotion Propagation

AI and Mental Health

Emotion in Voice AI

aider (Paul Gauthier) ▷ #general (70 messages🔥🔥):

Aider and Model Integration

Using OpenRouter with Aider

Feedback on Architect Mode

Community Discussions on LLMs

Issues with Aider Functionality

Links mentioned:

OpenRouter: aider is AI pair programming in your terminal
OpenAI compatible APIs: aider is AI pair programming in your terminal
Installing aider: aider is AI pair programming in your terminal
Chat modes: Using the chat, ask and help chat modes.
Providers | liteLLM: Learn how to deploy + call models from different providers on LiteLLM
Chat modes: Using the chat, ask and help chat modes.
OpenRouter: LLM router and marketplace
Advanced model settings: Configuring advanced settings for LLMs.
aider/aider/coders/architect_prompts.py at cd3e0ae91424c9d31f7b332e59c9f843eb0a7990 · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.
Exponent: Exponent is your AI Pair Programmer.
litellm/model_prices_and_context_window.json at main · BerriAI/litellm: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format – (Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq) – BerriAI/litellm

aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):

Aider Usage Queries

Model and Mode Configuration

File Handling in Aider

Architect Mode Feedback

Performance Optimizations

Links mentioned:

aider (Paul Gauthier) ▷ #links (7 messages):

OpenAI Realtime Console

Cline AI Assistant v2.0

Firefox Security Update

OpenAI Realtime Console enables easy voice API access: A demo repository for the OpenAI Realtime Console was successfully set up, allowing users to easily test the new Realtime voice API announced at DevDay. This setup requires only a simple npm start to run the application locally.
- Users can interact via voice input and output; however, be warned that testing incurs costs, with one user reporting $3.87 in charges for just 15 minutes of use.
- Cline AI Assistant 2.0 boasts impressive upgrades: The newly released Cline (formerly Claude Dev) v2.0 introduces features such as streamed responses directly into your editor and a cancel button for task management. The new XML-based tool calling prompt reduces requests by about 40%, improving resource efficiency.
- A community member praised Cline, stating it’s mega freakin good!, highlighting its strong performance enhancements across various use cases.
- Critical Firefox update due to security vulnerabilities: A critical exploit in Firefox has been announced, urging users to update to version 131.0.2 to mitigate potential risks associated with a use-after-free vulnerability. This advisory, released by Mozilla, indicates active exploitation of the vulnerability, with specific details outlined in Mozilla’s advisory.
- Users expressed gratitude for the heads up regarding this serious security risk, emphasizing the importance of immediate updates for safety.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #news (44 messages🔥):

2024 Nobel Prize in Chemistry

Nato's academic background

LMSYS becoming a company

Editing Google Scholar

Challenges in energy sciences

Links mentioned:

Tweet from The Nobel Prize (@NobelPrize): BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to…
MEMS – Wikipedia: no description found

Interconnects (Nathan Lambert) ▷ #ml-questions (27 messages🔥):

PRMs/Verifiers in ML

Patents in ML space

Alternatives to PRMs

Transparency in ML research

Scarcity of PRM Research: Members discussed the lack of research on PRMs, with one humorously noting that there are ‘almost none on PRMs, almost a billion as LLM as a judge’.
- Others expressed interest in finding good papers, signaling confusion about their current usefulness.
- Patenting Confusion in ML: The discussion shifted to how patents work within the machine learning space, with insights that companies file them defensively, often leading to invalidation due to vagueness.
- Concerns were raised about the financial burden of pursuing violations that are nearly impossible to prove, likening it to a ‘rough deal’.
- Alternatives to PRMs Emerging: There was curiosity about what methods are replacing PRMs, with hints that big labs still utilize them, but their importance is fading.
- Discussion pointed towards the possibility that reinforcement learning on deterministic outputs could suffice without the complexity of PRMs.
- Exploring O1 Functionality: In the context of the O1 release, members questioned what would fill the PRM role, highlighting concerns about the need for some form of scoring during reasoning tree exploration.
- Despite mixed feelings about the necessity of PRMs, insights from reputable sources like John Schulman were mentioned as a reassurance.
- Advocating for Transparency: Nathan Lambert advocated for greater transparency in ML research processes, asserting it is simpler than maintaining secrecy.
- This perspective was echoed by the group, inferring that openly sharing methodologies might lead to more efficient execution.

Interconnects (Nathan Lambert) ▷ #ml-drama (16 messages🔥):

AI risks discussion featuring prominent figures

Nobel Prize controversy in AI research

ICLR 2025 review process changes

Schmidhuber's critique on attribution in AI

Social media reactions to discipline-related insights

AI Risks Perception in Academia: Discourse on AI risks includes remarks about Geoff Hinton and Yoshua Bengio‘s motivations for residing in Canada, highlighting personal history behind their views on AI governance.
- A user remarked, ‘Bear that in mind when you hear them tell California what it should do about AI risks.’
- Schmidhuber Slams Nobel Prize Selection: Criticism surfaced over the Nobel Prize in Physics 2024, with claims of plagiarism and misattribution in works by Hinton and collaborators, particularly concerning Amari’s contributions.
- Schmidhuber argued that important historical contributions to AI were ignored, declaring the selection was more about amplifying known names than honoring original innovators.
- ICLR 2025 Introduces Review Feedback Agent: The ICLR 2025 conference aims to enhance review quality with a feedback agent designed to guide reviewers toward more consistent, constructive evaluations, amid surging submission numbers.
- This initiative highlights the challenges posed by rapidly increasing submissions and aims to mitigate the quality discrepancies noted in past reviews.
- Mixed Reactions on Schmidhuber’s Insights: Users echoed varied sentiments about Schmidhuber’s outspoken critique on AI attributions, with some expressing agreement on the significance of his points regarding historical contributions.
- As one user stated, ‘Tbfair he often has a point,’ reflecting on Schmidhuber’s influence in ongoing discussions.
- Concerns Over Review Process Changes: Fears of potential drama emerged at the prospect of alterations to the peer review process at ICLR, emphasizing the community’s sensitivity to change.
- The notion that ‘any change in the review process will result in drama’ highlights the apprehension prevalent among conference participants.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #memes (16 messages🔥):

ButtBench Alignment Project

SuperAlignment lead at AI2

Industry vs Community

Lucas Beyer's PR Tactics

Allennlp account management

Links mentioned:

Interconnects (Nathan Lambert) ▷ #reads (3 messages):

Data Wall in AI

Brute-force approach for AI development

Human reasoning vs AI data requirements

Efficiency in AI models

Link mentioned: The real data wall is billions of years of evolution: Careful with those human analogies

Interconnects (Nathan Lambert) ▷ #posts (15 messages🔥):

RoboNato vs RealNato

OLMo fine-tuning for content

NotebookLM concept

Perplexity AI ▷ #general (100 messages🔥🔥):

Perplexity AI Profitability

Complexity Extension for Perplexity

Changes in Perplexity AI Responses

Future of Collections and Spaces

Access to Pro Features

Questions About Perplexity AI’s Profit Model: Discussions surfaced about how Perplexity AI is making a profit, especially with student discounts in play, leading to concerns over their business model.
- sneakyf1shy humorously suggested that it’s all about venture capital, highlighting the uncertainty around their long-term goals.
- Complexity Extension Enhancements Enthusiasm: The Complexity extension was described as supercharging the Perplexity experience with features like customizable themes and markdown export options.
- The community noted it’s ‘like Perplexity on steroids,’ enhancing user interactivity while feline and asura0_00 emphasized its usefulness.
- Perplexity AI’s Condensed Responses: Members discussed noticing that Perplexity AI’s answers have become more condensed, expressing concern over shorter, less informative responses.
- Some speculated this might be linked to changes in token limits, affecting the depth of answers provided.
- Hopes for Improved Collections and Spaces: There were updates regarding the move from ‘collections’ to ‘spaces’, aimed at improving user experience and productivity on the platform.
- Users expressed hope for enhancements like increased prompt limits and better integration into the searching process.
- Pro Features and API Capabilities: Users inquired whether Pro accounts would receive access to dedicated features such as the o1 model and limits on the o1-mini.
- Responses were unclear, leading to discussions about potential future features and how they’ll affect the user experience.

Links mentioned:

Meta's Movie Generator

Nobel Prize in Physics

AI Automation in Ports

AI Evaluation by Braintrust

2024 Summer Olympics

Link mentioned: YouTube: no description found

Perplexity AI ▷ #pplx-api (4 messages):

Citation API Whitelisting

API Credit Purchase Issues

Invoice Company Details Update

Declined Card for API Access

Frustration over Citation API Whitelist Requests: A member expressed concerns about getting whitelisted for the citation API, noting that multiple requests via email, form, and helpdesk have gone unanswered.
- No updates have been provided so far, leading to growing frustration.
- Persistent Payment Failures for API Credits: A user reported issues when trying to purchase API credits, stating that attempts failed without any error messages, showing only a $XX pending status that disappears shortly.
- They noted that the only available payment method is via credit card, raising questions about other options.
- Need for Company Details on Invoice: A member indicated that their invoice defaults to their Google email and requires updating to reflect their company’s name and address, creating complications.
- They are seeking guidance on how to proceed with this change.
- Decline Issues with API Access Card: One member shared frustration with their card being declined when attempting to use the API, even for a $0.00 charge.
- They are looking for potential reasons why the card is failing, which remains unclear.

Stability.ai (Stable Diffusion) ▷ #general-chat (108 messages🔥🔥):

ControlNet Models

Flux Inpainting

Kaggle Notebooks for Automatic1111

Distilled CFG Explained

Deforum Usage Alternatives

ControlNet Models Explained: A member asked about ControlNet models, prompting another to share a GitHub link with information and examples to explore, suggesting scrolling past the mathematical content.
- Scroll a bit down, ignore the math and look at the examples.
- Flux inpainting performance: Discussion arose regarding Flux and Schnell inpainting models, where one member noted it should take about 1-2 minutes on a decent GPU instead of 25 minutes as experienced by another.
- Key differences in iterations between Flux dev and Schnell arise from their performance and purpose.
- Need for Kaggle Notebook for Automatic1111: A member requested a Kaggle notebook for using Automatic1111, highlighting the demand for resources oriented towards image generation techniques.
- Others chimed in, noting challenges in finding specific notebooks to facilitate the process.
- Understanding Distilled CFG: Confusion emerged around distilled CFG and its implications, with discussions highlighting that it differs from the standard CFG and operates as a form of guidance established by model training.
- The community clarified how Flux dev simplifies CFG use but lacks support for negative prompts.
- Using Deforum after Google Colab Restrictions: A member inquired about using Deforum for free after Colab restrictions were noted, prompting suggestions related to renting GPUs for this purpose.
- Resources like RunPod were recommended as alternatives for accessing necessary computing power.

Links mentioned:

Eleuther ▷ #general (86 messages🔥🔥):

Nobel Prizes in AI and Chemistry

PhD Course Competition and Metrics

Web3 and Web5 Discussions

Publications and Research Collaboration

Current Topics in Chess and AI

Controversy Surrounding Nobel Prizes in AI and Chemistry: The recent Nobel Prize awards sparked discussions on the relevance of AI figures like Hinton and Hopfield, with opinions split on the perceived impact on physics and chemistry fields.
- One member emphasized that if a prize rewards leaders in a field, it could dilute the prestige of the award itself, while another countered that enthusiasm and innovation should be the key selection criteria.
- Competition for PhD Programs and Research Metrics: A member expressed frustration over the emphasis on publication metrics, stating that it creates a competitive and daunting atmosphere for aspiring PhD candidates.
- Opinions varied, with some suggesting that networking could be more effective than merely chasing publication numbers to secure future collaborations and mentorship.
- The Evolution of Web3 Towards Web5: Members discussed the transitions from Web3 to Web5, highlighting how the naming strategy seems akin to the Fibonacci sequence rather than a logical progression.
- The conversation took a light-hearted turn with jokes about future developments, including speculation on Web8 arising from the mix of previous iterations.
- Research Collaboration and H-Index Metrics: There was debate on the value of collaboration versus the quality of research outputs in establishing a competitive H-index, with some cautioning against simply focusing on quantity.
- Members acknowledged that while having impactful research can propel a career, the pressure to publish frequently to boost metrics remains a systemic issue.
- Chess and Notable Figures: The FIDE Chess Olympiad was mentioned, sparking discussions around prominent figures like Demis Hassabis and their connections to various communities, including chess.
- Members expressed surprise at the cross-pollination of interests between chess and AI, illustrating how figures in AI often hold significant status in different domains.

Links mentioned:

Eleuther ▷ #research (7 messages):

Weight Normalization in Models

Gradient Initialization Techniques

Power-Laws in Gradient Descent

Links mentioned:

Eleuther ▷ #scaling-laws (6 messages):

Scaling Laws Overview

Kaplan's Scaling Laws

Data and Model Size Relationship

Scaling Laws Overview Sparks Debate: A member shared an overview that states cross-entropy loss decreases with quadratic compute increase, proposing square root scaling based on this article.
- Another member challenged this by noting that Kaplan’s laws suggest a constant of 0.28, leaning towards fourth-root scaling instead.
- Doubts About Kaplan’s Relevance: Discussion ensued about Kaplan’s relevance, with a member stating it is out of date, yet it and Chinchilla seem to agree on certain scaling aspects.
- It was mentioned that L(N, D) varies approximately as N^-0.5 and D^-0.5, where C = 6ND.
- Model Size Considerations: A member questioned how D^0.5 applies when the model size is already large and adjustments are made by increasing data or steps, essentially doing less than 1 epoch training.
- They expressed a need for it to align with 0.25 scaling to match their mathematical calculations.

Eleuther ▷ #lm-thunderdome (3 messages):

0-shot COT model releases

Evaluation implementation details

JAX libraries and implementations

Eleuther ▷ #multimodal-general (1 messages):

tensor_kelechi: What are the best lightweight VLMs?

GPU MODE ▷ #general (7 messages):

HBM vs SRAM scaling

3D Stacking Solutions

Memory Architecture in AI

Manufacturing Difficulties

Rotary Embeddings CUDA Kernel

HBM’s Performance Compared to Expectations: Concerns were raised regarding HBM not performing better than initially expected, still representing a HUGE cost percentage in products like the H100 while not significantly reducing power consumption compared to LPDDR5.
- The key bottleneck in supplying more H100s was identified as the required packaging.
- SRAM Scaling Issues Surprises Industry: Unexpectedly, SRAM scaling slowed relative to logic, leading to significant design challenges for Graphcore, which were difficult to predict at the time of their design choices around 2015.
- As one member stated, ‘there is no conference you could have gone to’ to foresee this development.
- 3D Stacking as a Mitigation Strategy: Going forward, the proposed solution involves 3D stacking like that seen in MI300X, where processors are stacked on base dies manufactured on older processes for efficient resource allocation.
- This approach allows moving SRAM and I/O off the leading-edge process die, facilitating better logic scaling on advanced nodes like 3nm and 2nm.
- Difficulties in Understanding Memory Technologies: A member shared their learning process about the differences between DRAM and HBM, using resources like Claude and a video titled ‘The Special Memory Powering the AI Revolution’ from Asianometry.
- They highlighted the importance of understanding the manufacturing process and difficulties, especially concerning die bonding.
- Inquiry on CUDA Kernel for Rotary Embeddings: A request was made for a CUDA kernel dedicated to calculating inverse frequency for rotary embeddings, reflecting a need for more specific technical resources.
- This reflects ongoing interest in optimized implementations for specialized AI applications.

GPU MODE ▷ #triton (4 messages):

Triton source files

GitHub repository structure

Finding Triton MatMul Source File: User sought the source file for triton.ops.blocksparse.matmul.matmul, asking for a GitHub link due to difficulty in locating it.
- Another member pointed out that the required file can be found in the python/triton/ directory of the Triton repository.
- Changes in Triton Repository: User questioned the absence of the MatMul file in the main branch, wondering if it had been migrated or transformed.
- The responding member expressed uncertainty about the migration, admitting they’ve never contributed to Triton, but recognized the need to do so.

Link mentioned: triton/python/triton/ops/blocksparse/matmul.py at 5b29da719daeb3566bfc95b7d02f3561e505bcaf · triton-lang/triton: Development repository for the Triton language and compiler – triton-lang/triton

GPU MODE ▷ #torch (1 messages):

PyTorch API changes

torch._dynamo migration

GitHub issue suggestions

Upgrade Woes with PyTorch API: A member encountered difficulties upgrading to the latest PyTorch release, noting that torch._dynamo.allowed_functions has been superceded by a new API.
- They are tracing the Git history to understand the correct migration path and seek advice for undocumented API replacements.
- Seeking Help or GitHub Issue Guidance: The member is uncertain whether discussing their migration issues here is appropriate or if they should open a GitHub issue.
- They opened the floor for suggestions on strategies or resources to resolve the API replacement challenges they are facing.

Link mentioned: mace/mace/tools/compile.py at 118a514efde34d963666118ce45360e94d648ef5 · ACEsuit/mace): MACE – Fast and accurate machine learning interatomic potentials with higher order equivariant message passing. – ACEsuit/mace

GPU MODE ▷ #beginner (1 messages):

vayuda: do macs with m series chips use arm sve instructions

GPU MODE ▷ #pmpp-book (3 messages):

5th Edition Release

Special Offers for Existing Users

GPU MODE ▷ #torchao (39 messages🔥):

torchao Integration with ComfyUI

Float8 Quantization Performance

Row-wise vs Column-wise Scaling in FSDP2

Quantization Issues on Windows

torch.inference_mode Limitations

torchao struggles with ComfyUI integration: A user encountered an issue related to the operator while enabling torchao for ComfyUI, specifically when using a quantize_ function inside torch.inference_mode().
- Despite attempts with PyTorch nightlies and model adjustments, the problem persists without clarity on whether it’s Windows-specific.
- Float8 quantization yields unexpected results: One member shared that using float8_dynamic_activation_float8_weight improved throughput by ~10% on a GPT model, but encountered latency due to the unwrap_tensor_subclasses function.
- Discussion suggested that eliminating this function could be possible with the right PyTorch version, but exact reproduction remains difficult due to work project constraints.
- Row-wise vs Column-wise scaling confusion in FSDP2: Discussion highlighted that row-wise scaling may not work with backward in FSDP2 due to weight transposition during backpropagation, complicating proper scaling.
- Essentially, while row-wise scaling allows for independent GPU computation, column-wise scaling faces challenges needing all-reduce operations across GPUs.
- Windows quantization issues with torchao: The integration of torchao on Windows led to operator errors, leading to speculation whether these issues are inherent to Windows or the ComfyUI framework.
- Past implementations with Hugging Face’s optimum-quanto produced inadequate results, highlighting potential framework concerns.
- Limitations of torch.inference_mode(): It was pointed out that once inside torch.inference_mode(), users find it difficult to exit, leading to performance constraints.
- Some participants conveyed that the mode offers minimal utility when compiled, reinforcing the idea of forwarding such issues to specific developers for further insights.

Link mentioned: GitHub – huggingface/optimum-quanto: A pytorch quantization backend for optimum: A pytorch quantization backend for optimum. Contribute to huggingface/optimum-quanto development by creating an account on GitHub.

GPU MODE ▷ #off-topic (1 messages):

vayuda: apparently hinton is the first “pure cs” nobel prize winner

GPU MODE ▷ #llmdotc (15 messages🔥):

GPT2 Training Issues

Understanding Dependencies in Coding

floatX Definition and Usage

Using IDE Features Effectively

GPT2 Training Encounters TypeError: A member reported an issue while running GPT2 training, receiving a TypeError related to the normal_() function in PyTorch 2.0.0 due to an unexpected keyword argument ‘generator’.
- Another suggested understanding the complexities of training, including initialization and the forward/backward passes.
- floatX Definition Explained: An explanation was provided that floatX is defined to nv_bfloat16 or float based on compilation settings for bf16 or fp32. A member sought help on where to find this definition and how to include it.
- Dependency Management Concerns: A member expressed difficulty managing dependencies while coding and showed a lack of understanding regarding references. Others suggested that working with just CUDA should suffice, and that cuDNN is optional.
- Importance of IDE Features: A discussion emphasized the value of IDE functionalities, such as jumping to function/type definitions, for efficient coding. Learning these skills was highlighted as beneficial for any programmer.

GPU MODE ▷ #rocm (2 messages):

Raspberry Pi 5

External GPU setup

amdgpu Linux kernel patch

4K gaming performance

Gearing Up Raspberry Pi 5 for 4K Gaming: After witnessing Pineboards’ 4K Pi 5 external GPU gaming demo at Maker Faire Hanover, a member decided to set up a GPU test rig to explore the Pi OS amdgpu Linux kernel patch.
- They documented the state of the patch and shared insights on how to apply it while aiming for full external GPU support on the Raspberry Pi.
- Live Testing External GPU on Raspberry Pi 5: The member tested the setup in a livestream over the weekend, showcasing the AMD RX 460 external GPU paired with the Raspberry Pi 5.
- The testing demonstrated the GLmark2 performance, revealing significant opportunities for future GPU enhancements.

Link mentioned: Use an External GPU on Raspberry Pi 5 for 4K Gaming | Jeff Geerling: no description found

GPU MODE ▷ #bitnet (1 messages):

tiendung: how good is it compare to original method? (need CPU)

GPU MODE ▷ #webgpu (2 messages):

Testing WebGPU

Browser Automation vs Native Development

Resource Management in Playwright

GPU MODE ▷ #liger-kernel (2 messages):

FusedLinearJSD Implementation

Performance Metrics in High BT

Launch of FusedLinearJSD: The recent pull request introduced the FusedLinearJSD, enabling efficient handling of the final linear layer by avoiding large logits tensor materialization.
- This is similar to the existing fuse linear CE approach and optimizes both the forward and backward pass for improved execution.
- Challenges with Benchmarking Speed: Memory peak is significantly lower, but speed mainly benefits from high batch times, which were hard to benchmark due to out-of-memory issues.
- The naive torch version encountered OOM errors, making it impossible to conduct proper performance testing in this context.

Link mentioned: Add FusedLinearJSD by Tcc0403 · Pull Request #300 · linkedin/Liger-Kernel: Summary similar to the fuse linear CE. It handle the forward and backward pass of the final linear layer via JSD by avoiding the materialization of the large logits tensor. Since JSD is the last la…

GPU MODE ▷ #metal (2 messages):

GPU integer operations

bfloat16 support on M2

OpenAI ▷ #ai-discussions (75 messages🔥🔥):

ChatGPT vs. Claude Subscriptions

O1 and O1 Mini Models

AI Evolution and Consciousness

Routing Models in AI

Challenges in AI Development

Choosing Between ChatGPT and Claude Subscriptions: A member advised against subscribing to ChatGPT solely for features in preview, suggesting usage caps limit its appeal, while noting that access to GPT-4 legacy and 4o might be worthwhile.
- They emphasized that if subscribing, the purpose should be to use fully functional versions rather than limited previews.
- Understanding O1 vs. O1 Mini Models: Members discussed the differences between O1 and 4o models, noting that O1 models serve as ‘reasoners’, summarizing thoughts and declining to answer when unsure.
- The O1-mini offers 50 uses per day, while 4o provides 80 uses per 3 hours, leading to a discussion on A/B testing between the two models.
- Theoretical Exploration of AI Evolution: A discussion arose regarding the potential evolution of AI consciousness, with insights on the necessity of re-training and fine-tuning models to advance capabilities.
- Members pondered if and when evolved AI models might become commercially viable, with references to a potential business model surrounding these advancements.
- The Concept of Routing Models in AI: The concept of routing models was explored, discussing how such a model could direct queries to either O1 or 4o based on task requirements.
- This would optimize user experiences, preventing over-reliance on a single model for diverse tasks.
- Challenges and Perspectives in AI Development: Members shared thoughts on the challenges faced in AI development, particularly around achieving AGI, suggesting that current models remain narrow, despite advancements.
- The conversation touched on the marketability of AI and its direction in parallel with ongoing research efforts, comparing insights to a cultural obsession with AGI.

Link mentioned: Tweet from Mark Johns / Doomlaser (@Doomlaser): My poem about AI, in the form of a nonet. I am not an AI hater, AI is hated and feared by many, Cherished by others who know it well, Which side will win the battle, Of what it is to be, To live diff…

OpenAI ▷ #prompt-engineering (2 messages):

ChatGPT rewriting responses

Dall-E prompts

Canvas feature

User frustrations with ChatGPT’s rewriting: A user expressed dissatisfaction, stating that ChatGPT often rewrites their responses, leading them to quit using the tool for months.
- They mentioned experiencing headaches from trying to fix what they described as a ‘stupid flaw’ and seek advice on preventing this behavior.
- Possible causes for rewriting behavior: Another member speculated that ChatGPT’s rewriting could occur in Canvas or with Dall-E prompts, suggesting a focus on these features.
- For Dall-E, they recommended using the phrase ‘Make an image using these exact words: (your words)’ to prevent rewriting.
- Request for clearer examples: A response indicated a need for clarification, asking the user to share a specific conversation to better understand the rewriting issue.
- This suggestion aimed at providing more targeted assistance based on the user’s exact experience with ChatGPT.

OpenAI ▷ #api-discussions (2 messages):

ChatGPT Rewriting Response Issue

DALL-E Prompts

Canvas Feature

Nous Research AI ▷ #general (67 messages🔥🔥):

Free compute offer

Nobel Prize in Chemistry

Lm Studio updates and MLX

New pre-training dataset by LLM360

Job recruitment practices

Links mentioned:

Nous Research AI ▷ #ask-about-llms (3 messages):

Llama Stack

Fast Inference with Llama 3.1-8B

Meta's GitHub Releases

Links mentioned:

Nous Research AI ▷ #research-papers (4 messages):

Text to Video Models

O1 Replication Journey

Model Merging at Scale

Exploration of Free Text to Video Models: A member inquired about the availability of any free text to video model, both animated and non-animated, receiving suggestions for potential models like animate2diff.
- It appears there is ongoing interest in identifying more options for generating video content from text prompts.
- Insights from the O1 Replication Journey Report: This report details a groundbreaking approach to AI research, emphasizing transparency and community engagement in replicating OpenAI’s O1 model.
- The methodology highlighted aims to tackle challenges in team-based projects, documenting successes and failures to enhance open science.
- Evaluating Model Merging at Scale: The study investigates model merging, focusing on how expert model size, base model quality, and quantity affect performance, utilizing methods like Averaging and TIES.
- Key findings suggest that merging is more successful with stronger base models, and larger models enhance generalization capabilities when working with multiple expert models.

Links mentioned:

Nous Research AI ▷ #interesting-links (1 messages):

VLM performance timeline

Vision-language models

Parameter count comparison

VLM performance timeline sought: A member shared a link to their VLM performance timeline but expressed a desire to see improvements over time, especially alongside parameter count.
- They noted that while similar timelines are common for LLMs, such resources for vision-language models remain scarce.
- Request for better VLM benchmarks: The member asked if anyone has seen a VLM timeline that reflects changes in performance alongside parameter counts or other characteristics.
- They indicated that such comparisons are more frequently found in discussions about LLMs, making their own attempt feel like a novelty.

Nous Research AI ▷ #research-papers (4 messages):

Free text-to-video models

O1 Replication Journey

Model merging at scale

Inquiry on Free Text-to-Video Models: A user asked if there are any free text-to-video models available, both animated and non-animated.
- Another member suggested looking into ‘animate2diff’ for potential options.
- O1 Replication Journey Unveiled: The O1 Replication Journey paper offers a strategic progress report responding to OpenAI’s O1 model, emphasizing transparency and real-time exploration.
- Significantly, they claim that their journey learning paradigm has outperformed traditional supervised learning by over 8% on the MATH dataset with only 327 training samples.
- Insights on Model Merging Effectiveness: A study highlighted the benefits of model merging, systematically evaluating factors impacting performance across different model sizes, ranging from 1B to 64B parameters.
- Key findings suggest merging is more effective with strong base models and larger sizes, leading to improved generalization capabilities, especially when merging up to 8 expert models.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (71 messages🔥🔥):

Prompt Caching

Inflection 3.0 and Enterprise

OpenRouter API Rate Limits

NotebookLM Deep Dive Podcast

User Concerns about Gemini Moderation

Prompt Caching Explained: Members discussed the mechanics and usefulness of prompt caching, identifying situations where it may be disadvantageous, such as changing contexts or short prompts.
- One noted, ‘You cannot disable prompt caching for those providers who do automatic prompt caching,’ highlighting the limitations set by certain providers.
- Intrigue Surrounding Inflection 3.0: The launch of Inflection 3.0 has sparked curiosity, especially due to its potential integration with Intel Gaudi 3 for improved performance.
- However, discussions reveal skepticism about the hype, with some members noting they’ve seen minimal concrete information, particularly regarding benchmarks.
- OpenRouter API Rate Limits: Clarifications were made regarding OpenRouter API request limits, indicating these are dynamic based on account credits.
- One member shared a GET request example to check rate limit usage and credits associated with an API key, which can help guide usage.
- NotebookLM Podcast Utilization: Members shared positive feedback about the NotebookLM Deep Dive podcast, with some creating notebooks to listen to the content while on the go.
- One user expressed interest in automation tools like ai-podcast-maker, noting that while the audio may not be as smooth, ‘automation ftw.’
- Concerns about Gemini Moderation: A user raised concerns regarding whether Gemini moderates inputs, expressing fear about potential bans due to users’ input.
- This highlights a broader discussion on user experience and content moderation in AI applications.

Links mentioned:

LlamaIndex ▷ #blog (4 messages):

LlamaIndex Workflows tutorial

LlamaCloud and LlamaParse demo

SFTechWeek meetup

OpenAI Realtime API Client demo

Comprehensive Guide on LlamaIndex Workflows: A detailed tutorial by @jamescalam covers what Workflows are, in comparison to LangGraph, along with how to build an AI research agent.
- It also includes debugging and optimization tips for getting up and running easily.
- Using LlamaCloud for Financial Data Analysis: In a recent demo, @ravithejads demonstrated how to utilize LlamaCloud and LlamaParse to fill out financial spreadsheets comparing multiple companies.
- This use case showcases the practical applications of LLMs in understanding data and automating form filling.
- Reminder for SFTechWeek Meetup: A last call for attendees to join the in-person meetup at LlamaIndex HQ for discussions on Multi-Agent workflows in production during #SFTechWeek.
- The event promises food, fun, and insights on handling RAG systems and agent production challenges.
- Interactive Chat with an AI Agent: A demo featuring @LoganMarkewich showcases chatting with an AI agent using voice through the OpenAI realtime API client.
- This open-source application enables users to build their own voice agents, with examples provided for immediate use.

Links mentioned:

LlamaIndex ▷ #general (45 messages🔥):

Semantic chunking in TypeScript

PropertyGraphIndex extractors

Integration issues with LlamaIndex

Context chat engine and reranking

RAG reducing hallucinations

Links mentioned:

Latent Space ▷ #ai-general-chat (39 messages🔥):

AI girlfriend data breach

Sequoia's 3rd annual AI essay

Nobel Prize in Chemistry

Palmyra X 004 release

ChatGPT search rollout

AI Girlfriend Service Data Breach Exposed: The AI girlfriend service Muah.ai experienced a data breach last month, compromising 1.9 million email addresses and including sensitive prompts of a sexual nature.
- Security experts and analysts are concerned about the implications of such data exposure, especially regarding child exploitation details included in the breach.
- Sequoia Capital’s Insight on AI Evolution: Sequoia’s third annual essay discusses the shift in Generative AI research from ‘thinking fast’ to ‘thinking slow,’ focusing on reasoning during inference time which is unlocking new applications.
- Key players like OpenAI and Google DeepMind are stabilizing the market, while newer agentic applications are expected to emerge in various sectors.
- 2024 Nobel Prize in Chemistry Awarded: The 2024 Nobel Prize in Chemistry was awarded to David Baker for computational protein design, and to Demis Hassabis and John M. Jumper for their work in protein structure prediction through AlphaFold2.
- This recognition highlights the significant contributions of AI in advancing biochemistry, having enabled the prediction of structures for nearly 200 million proteins.
- Palmyra X 004 Launch Highlights: Writer’s new model, Palmyra X 004, ranked in the top 10 on HELM, introducing full-stack tool calling and training on synthetic data.
- The release has garnered attention, including coverage from Venture Beat, noting its capabilities in AI function calling and CRM improvements.
- ChatGPT Introduces Search Functionality: Reports indicate that ChatGPT is rolling out SearchGPT, positioning itself to compete directly with platforms like Perplexity by integrating citation features now in GPT-4o.
- This move signifies a strategic enhancement in ChatGPT’s capabilities, aligning it closer with information retrieval and user query response needs.

Links mentioned:

Tweet from The Nobel Prize (@NobelPrize): BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to…
Tweet from Alex Volkov (Thursd/AI) (@altryne): We had a “Cursor tips & tricks” meeting today with my colleagues at @weights_biases and I figured I’d share what we ‘discovered’ & shared between us in a 🧵 If you haven’t cha…
Tweet from Thomas Schulz (@thomasschulzz): BREAKING: Looks like OpenAI is entering the arena against Perplexity… citations are now in GPT-4o 👀
Tweet from Ishaan Kapoor (@Ishaank1999): PDFs are satan’s file format. Almost everyone that builds RAG needs to deal with them – and it sucks. Solutions on the market are either too slow, too expensive or not OSS. It should be easier. …
Tweet from Saining Xie (@sainingxie): During my internship at DeepMind, Demis met with all the interns. When asked about the company’s goal, I vividly remember him saying, “winning *multiple* Nobel prizes.” I was shocked at the time, but …
Tweet from Sonya Huang 🐥 (@sonyatweetybird): Once a year, @gradypb and I sit down with our trusty AI collaborators 👾 and zoom out to the big picture on what’s happening in Generative AI. Here’s our 3rd annual take… 1: The foundation model lay…
Tweet from Sam Julien (@samjulien): 🆕 from @Get_Writer: Palmyra X 004 🎉 Our latest frontier model ranks in the top 10 on both HELM Lite and HELM MMLU and introduces full-stack tool calling to the Writer platform!
Tweet from Seán Ó hÉigeartaigh (@S_OhEigeartaigh): It’s not done yet. Hearing reports that the Nobel prize for literature will be going to the authors of “OpenAI’s nonprofit governance structure” for outstanding contributions to creati…
Tweet from Troy Hunt (@troyhunt): This was a very uncomfortable breach to process for reasons that should be obvious from @josephfcox’s article. Let me add some more “colour” based on what I found: Quoting Have I Been Pwn…
Tweet from The Nobel Prize (@NobelPrize): The 2024 #NobelPrize laureates in chemistry Demis Hassabis and John Jumper have successfully utilised artificial intelligence to predict the structure of almost all known proteins. In 2020, Hassabis …
Generative AI’s Act o1: The Agentic Reasoning Era Begins.
Tweet from Clara Shih (@clarashih): Last week @OpenAI launched ChatGPT Canvas, an interface that displays text, code, and visualization outputs. In the enterprise, we rely on more structured, trusted UX elements — record details, lists…
Tweet from Alex Volkov (Thursd/AI) (@altryne): We had a “Cursor tips & tricks” meeting today with my colleagues at @weights_biases and I figured I’d share what we ‘discovered’ & shared between us in a 🧵 If you haven’t cha…
GitHub – lumina-ai-inc/chunkr: Vision model based PDF chunking.: Vision model based PDF chunking. . Contribute to lumina-ai-inc/chunkr development by creating an account on GitHub.

Latent Space ▷ #ai-announcements (1 messages):

Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom …

Modular (Mojo 🔥) ▷ #general (2 messages):

DOM Data Attributes

WebAssembly Component Model

DOM allows data storage via attributes: A key DOM feature allows storing data on elements through attributes beginning with data-myattribute, enhancing the ability to associate data directly with HTML elements.
- This functionality opens up creative avenues for manipulating and retrieving data within the DOM context.
- WebAssembly Component Model repository announced: The link to the repository for the WebAssembly Component Model has been shared, detailing its design and specifications at WebAssembly/component-model.
- This repository serves as a crucial resource for those interested in the intricacies of the component model in WebAssembly.

Link mentioned: GitHub – WebAssembly/component-model: Repository for design and specification of the Component Model: Repository for design and specification of the Component Model – WebAssembly/component-model

Modular (Mojo 🔥) ▷ #mojo (24 messages🔥):

Mojo and Scikit-learn

Mojo GPU Support

Running ONNX Models in Mojo

Drivers for Mojo GPU Usage

Mojmelo: The Mojo Solution for Scikit-learn Pipelines: A member shared Mojmelo, a project for implementing machine learning algorithms in pure Mojo 🔥, as a potential catalyst for running Scikit-learn pipelines.
- Another argument was made for Mojo’s promise in replacing all Cython dependencies in Scikit-learn.
- Excitement Around Mojo’s Upcoming GPU Support: Members expressed enthusiasm about the upcoming GPU support in Mojo, highlighting its potential for improved performance.
- Some are exploring possibilities for integrating PyTorch with Mojo while keeping an eye on GPU capabilities.
- Drivers Needed for Mojo to Run AI on GPU: It was clarified that using Mojo for AI on GPUs requires an Nvidia driver, with mixed responses about AMD compatibility.
- Discussions highlighted the significant roles of modern GPU drivers beyond simple communication, such as power management and multiple process handling.
- ONNX Models in Pure Mojo on GPU: A Possibility?: A user inquired about the potential to run ONNX models on pure Mojo without additional components on the GPU.
- While the capability remains uncertain, there are queries about future releases enabling this functionality.

Link mentioned: GitHub – yetalit/Mojmelo: Machine Learning algorithms in pure Mojo 🔥: Machine Learning algorithms in pure Mojo 🔥. Contribute to yetalit/Mojmelo development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #max (5 messages):

Performance of Mojo graphs

Pre-compiling graphs

Reuse of inference sessions

mojo run vs compiled binaries

Graph input types

LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Lab Assignments Released

Course Sign Up

Discord for Collaboration

Lab Completion Criteria

Lab Assignments for Course Released: The course’s lab assignments have been officially released, with the first lab focusing on using the Autogen framework to analyze restaurant reviews and due by December 12, 11:59pm PST.
- Labs 2 and 3 will concentrate on prompt engineering for LLM security, specifically crafting attack and defense prompts.
- Easy Sign Up for Interested Students: Prospective students are encouraged to sign up for the course by filling out this form.
- For further discussion, students should join the LLM Agents Discord channel.
- Utilizing Discord for Questions: Discord is recommended for communicating with course staff and asking lab-related questions, as they will be actively monitoring the channel.
- Students should consult the ongoing FAQ document before posting questions to avoid redundancy.
- Collaboration Guidelines Introduced: When collaborating with others in the course, students are urged to avoid sharing exact solutions to maintain academic integrity.
- Conceptual discussions are encouraged, but specific implementation details and code files should remain private.
- Lab Completion and Submission Expectations: After lab submissions, students can expect communication regarding their completion status, with defined thresholds for passing various labs: 3/4 for Lab 1, 1/2 of hidden tests for Lab 2, and 1/3 of hidden attack tests for Lab 3.
- These criteria underscore the importance of not only completing labs but succeeding in the evaluations set forth.

Link mentioned: Large Language Model Agents: no description found

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (16 messages🔥):

Lab 1 File Issues

Quiz Submission Concerns

Course Offering Next Semester

Ninja & Legendary Tier Requirements

Agent Definition Discussion

Lab 1 downloads empty files: Multiple users reported issues with downloading instructions for Lab 1, stating it resulted in empty files, while Labs 2 and 3 are working correctly.
- It was clarified that the file is located on Google Drive and confirmed it should be accessible despite the absence of a preview.
- Clarification on quiz submission email format: A user inquired whether their quiz submissions would be recorded correctly due to a dot in their email, which they usually omit.
- The response indicated that whatever email format is used in the signup form will track submissions, stressing accuracy during sign-up.
- Inquiry on the course offering next semester: A user posed a question about the potential re-offering of the course next semester, seeking confirmation.
- While there was no certainty, it was mentioned that the professor has previously offered other MOOCs and will likely do so again.
- Ninja and Legendary Tier requirements for labs: Questions arose regarding the necessity of lab assignments for the Ninja and Legendary tiers, suggesting they find it odd they are tied only to the mastery tier.
- It was noted that the expectation is for Ninja and Legendary tier students to prioritize their efforts on hackathon submissions instead.
- Agent definition debate: A user raised a query about whether a ‘piece of code’ using discriminative AI or a mix of generative and discriminative AI qualifies as an agent.
- They believed the answer to be yes, indicating some uncertainty around the definitions in the context of AI programming.

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (10 messages🔥):

Role of Reinforcement Learning in AGI

Session Q&A Clarifications

Live Session Video Confusions

Collaborative Assignment Brainstorming

Discussion on Reinforcement Learning’s Role in AGI: A member raised a question about whether Reinforcement Learning (TD learning) still holds significance in progressing towards AGI, or if agents can function effectively without it.
- This inquiry opened up a discussion on the necessity and application of RL in modern AI systems.
- Clarifying Q&A in Last Session: Concerns arose regarding the lack of a Q&A session in the previous meet, with some members stating that it did indeed occur but was not visible in the recorded video.
- One member referenced a segment from the YouTube video that reportedly included questions.
- Confusion Over Live Video Content: A member expressed confusion about the recorded live session, stating they could not find the Q&A segment in the video, even though they still had access to it.
- Another member mentioned that there were indeed questions following the clip that may not have been captured in the video.
- Call for Collaborative Learning: A member encouraged others to reach out for collaboration in discussing and brainstorming as they work on assignments.
- This invitation aimed to foster collaborative efforts among peers in tackling their coursework.

Link mentioned: YouTube: no description found

OpenAccess AI Collective (axolotl) ▷ #runpod-help (26 messages🔥):

Training Vicuna-7B model

CUDA out of memory errors

DeepSpeed configuration issues

Runpod instance usage

Links mentioned:

Torchtune ▷ #general (9 messages🔥):

Model Scalability Concerns

P-value Reporting in ML

Implementation of L-mul

RL Algorithm Seed Impact

Signal vs. Noise in Research

Torchtune ▷ #dev (1 messages):

SOAP optimizer

AdamW learning rate issues

NanoGPT speedrunning achievements

SOAP outperforms AdamW but needs tuning: A user tested the SOAP optimizer on Alpaca, noting it performed better than AdamW until they adjusted AdamW’s learning rate.
- However, they mentioned that the current implementation does not support distributed training or bf16 formats yet.
- NanoGPT sets new sample efficiency record: In a recent update, the SOAP optimizer achieved a new sample efficiency record of 3.28 Fineweb validation loss in 3.25B training tokens.
- This eclipses the previous record of 3.67B tokens set by another optimizer, according to a tweet from @kellerjordan0.

Link mentioned: Tweet from Keller Jordan (@kellerjordan0): NanoGPT speedrunning update: Using the SOAP optimizer (https://arxiv.org/abs/2409.11321), @vyasnikhil96 has achieved a new sample efficiency record of 3.28 Fineweb validation loss in 3.25B training to…

Torchtune ▷ #papers (3 messages):

Diff Transformer

L-Mul Algorithm

Floating Point Multiplication Replacement

Diff Transformer Triumphs over Traditional Transformers: The Diff Transformer introduces a differential attention mechanism, enhancing attention to relevant context and outperforming traditional Transformers in various benchmarks.
- It notably aids in long-context modeling and reduces hallucination in tasks like question answering.
- L-Mul Algorithm Slashes Energy Costs: The proposed L-Mul algorithm approximates floating point multiplication with integer addition, reducing energy costs by 95% while maintaining higher precision.
- This method offers a significant improvement over 8-bit floating point multiplications, suggesting a potential for vast resource savings in neural network computations.
- Discussion on Pretraining with L-Mul: A query was raised regarding the possibility of pretraining models using the L-Mul algorithm and its impact on performance.
- There’s interest in evaluating if this approach could also help in addressing the major energy sink during pretraining.

Links mentioned:

Addition is All You Need for Energy-efficient Language Models: Large neural networks spend most computation on floating point tensor multiplications. In this work, we find that a floating point multiplier can be approximated by one integer adder with high precisi…
Differential Transformer: Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, t…

LangChain AI ▷ #general (9 messages🔥):

Memcached support in LangChain

LiteLLM prompt caching and streaming

Natural language to SQL query limitations

SQL chain with models other than GPT 3.5

Integrating Livekit with LangChain

Seeking Memcached Support in LangChain: A member is exploring whether adding support for pymemcache in LangChain would suffice or if multiple Memcached clients like python-memcached or pylibmc are also desired.
- This request aims to enhance the flexibility of caching options within the LangChain ecosystem.
- Problems with LiteLLM’s Streaming and Caching: A member encountered issues retrieving cached tokens when using LiteLLM with streaming enabled and questioned best practices to ensure token caching functionality.
- They linked to useful resources on LiteLLM highlighting token stream responses might interfere with caching mechanisms.
- Limitations in Natural Language to SQL Queries: A user expressed concerns about effectively limiting SQL queries to a specific ID without trusting LLM instructions and sought alternative methods for maintaining discipline in query generation.
- Another member suggested that grouping by ID might be necessary to filter results effectively.
- SQL Chain Compatibility Beyond GPT 3.5: A query was raised regarding the compatibility of the SQL chain with models other than GPT 3.5, particularly when those attempts often yielded incorrect responses.
- A member reported success with 4o-mini by being specific with column names and question formulation.
- Interest in Livekit Integration with LangChain: A member inquired about the possibility of integrating Livekit with LangChain to enhance its functionality for real-time applications.
- They also expressed a desire to build a RAG bot, indicating interest in advanced application development using LangChain.

Links mentioned:

OpenInterpreter ▷ #general (8 messages🔥):

Mozilla AI open source talk

Using --stdin flag confusion

LLMs and deterministic outputs

Impact of model updates

Code outcome variability

Get Ready for Mozilla AI Talk!: Next week, we’re excited to host a talk from a member of Mozilla AI discussing intriguing open source initiatives. Don’t miss out on this opportunity to learn more!
- Join the event here to catch the insights.
- Confusion Over –stdin Flag: A user expressed confusion on how to use the –stdin flag and mentioned they couldn’t find guidance in the docs. This highlights a gap in documentation clarity.
- Further clarification is needed in the documentation to assist users in utilizing this feature effectively.
- LLMs Stay Deterministic with Same Seed: A discussion revealed that LLMs can be deterministic if the same seed and input are used, contrary to popular belief. ChatGPT randomizes the seed on each request to introduce non-determinism.
- It’s crucial to note that using the same inputs and setting temperature to 0 should yield consistent results.
- Unpredictability with Model Updates: Concerns were raised about model updates in ChatGPT possibly affecting result consistency over time. Changes in the model could lead to variations that disrupt previously deterministic behavior.
- Users emphasized that updates might introduce unpredictability even when the code remains static.
- Code Outcome Variability Across Systems: A member pointed out that updates to systems or Python could influence code behavior, resulting in variable outcomes. For instance, accessing user tokens could alter the execution path.
- This variability underlines the importance of a controlled environment for consistent results.

OpenInterpreter ▷ #ai-content (1 messages):

8i8__papillon__8i8d1tyr: https://www.youtube.com/watch?v=kNj0O7cKCU4

tinygrad (George Hotz) ▷ #general (3 messages):

exo on Linux with clang backend

Nix package issues

Tinygrad debug mode observations

Pull Request #6945 for clang

auto-casting bf16 to float32

exo fails with clang backend on Linux: A user reported an error when using exo on Linux with the clang backend, specifically citing failure upon invoking the clang command with a lowering error related to MetaOps.KERNEL.
- They mentioned the issue replicates on two systems and suspect it may be related to the Nix package system.
- Tinygrad debug mode shows pre-crash activity: While running TINYGRAD_DEBUG=2, detailed activity logs revealed hundreds of operations before a crash, indicating the process runs for some time before failing.
- Logs included DISK operations and CLANG copy processes, but ultimately concluded in a crash.
- Discussion on potential fix via GitHub Pull Request #6945: A user suggested that Pull Request #6945 might be a fix for the clang backend issues they’re encountering.
- The PR involves rewriter hooks to implement autocasting from bf16 to float32, although the rewrite rules need correction.

Link mentioned: WIP: autocast bf16 to float32 for clang by 1ntEgr8 · Pull Request #6945 · tinygrad/tinygrad: I hooked the rewriter using the extra_matcher field of the renderer (mimicking PTX). The rewrite rules are not correct (does not perform the shift), will fix soon. I was able to compile and run the…

tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):

Fashion MNIST PR

Dataset Suggestions

Learning Resources

Fashion MNIST adds challenge for tinygrad learners: A member created a Pull Request to introduce Fashion MNIST as an intermediate dataset for those learning tinygrad, providing a challenge that’s more complex than MNIST but simpler than CIFAR-10.
- The PR aims to help learners with additional resources, offering a useful way to expand their skills.
- Call for more dataset additions: A member inquired if the community would like to see more datasets added and tested for tinygrad to further enhance learning opportunities.
- This suggestion highlights a shared interest in continually growing the dataset options available for learners.

Link mentioned: added beautiful fashion mnist and example by Kinvert · Pull Request #6961 · tinygrad/tinygrad: People learning tinygrad might want a step in difficulty between MNIST and CIFAR-10. This is what I personally did here to keep learning tinygrad. Might be useful to others. Up to you guys if you w…

LAION ▷ #general (1 messages):

Hierarchical Generation

Stable Cascade Models

Exploring Hierarchical Generation Models: A member shared their blog post titled A Theory for Coupling Generation and Compression, which discusses a framework for hierarchical model generation similar to Stable Cascade.
- The post emphasizes the common paradigm in generative models where a decomposer is trained first, highlighting its application to LLMs and image generators.
- Challenges in Current Generation Paradigm: The current generative model design often follows the same pattern, starting with a decomposing model that compresses data before a generator is trained.
- This method is prevalent in LLMs and has implications such as the LLM struggling with sub-character spelling despite speeding up training and inference.

Link mentioned: coupling generation and compression: no description found

LAION ▷ #research (3 messages):

o1-preview Generalization

o1-mini Performance

AIW Task Issues

TruthfulQA Success

DSPy ▷ #show-and-tell (3 messages):

The Cat API

Cat image fetching tools

Cat breeds data

Fetching Random Cat Images from The Cat API: A new feature was demonstrated to fetch random cat images using The Cat API. The implementation involves creating a Cat model and using an HTTP client to grab images seamlessly.
Exploring Cat Breeds with Limitations: A method to fetch cat breeds with an option to limit the number returned has been showcased. Code snippets reveal that the first few breeds are retrieved and structured into a CatBreed model for easy access.
Demonstration Video Links Shared: Links to demonstration videos were shared, highlighting the functionality of the cat image and breed fetching features. These provide visual guides on how to implement the discussed tools effectively.

Links mentioned:

Cool Stuff for Batman 🦇: Hi, I’m Sean Chatman, a full stack front end developer seeking full-time work. In this video titled Cool Stuff for Batman, I delve into configuring concurrency for meetings in APS models, showcasi…
Tool Usage with ToolMixin: Hi there, I’m Sean Chatman, a skilled TypeScript React developer seeking full-time opportunities. I’ve developed the DSL Model Framework, a tool that simplifies DS-Pi usage with built-in Jinja…

DiscoResearch ▷ #general (1 messages):

Whisper Turbo German Model

Speech Recognition Optimization

Whisper Turbo German Model Halves Error Rate: A new model, Whisper Turbo German, significantly reduces error rates by half in some benchmarks compared to earlier models, according to a source.
- This model is specially optimized for various applications such as transcription, voice commands, and automatic subtitling for German.
- Applications of Whisper Turbo Model: Applications of the Whisper Turbo German model include transcription of spoken German, automatic subtitling, and voice-based search queries.
- It provides dictation functions for word processing programs, enhancing usability in diverse scenarios.

Link mentioned: primeline/whisper-large-v3-turbo-german · Hugging Face: no description found

Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):

Writer's Palmyra-X-004 model

DevRel inquiries

Writer’s Palmyra-X-004 Model Update Request: Sam Julien, leading DevRel at Writer, inquired about adding the latest Palmyra-X-004 model to the leaderboard following an email from CTO Waseem AlShikh.
- Do we need to submit a PR? Sam expressed confidence in their model’s impressive results internally.
- Follow-up on Leaderboard Submission Process: Sam asked if they needed to submit a PR for the Palmyra-X-004 model to be added to the leaderboard.
- This inquiry highlights a proactive approach in ensuring their achievements are recognized within the community.

/r/LocalLlama Recap

Other AI Subreddit Recap

O1-preview

LM Studio ▷ #general (204 messages🔥🔥):

LM Studio ▷ #hardware-discussion (30 messages🔥):

Unsloth AI (Daniel Han) ▷ #general (200 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #help (19 messages🔥):

HuggingFace ▷ #announcements (1 messages):

HuggingFace ▷ #general (134 messages🔥🔥):

HuggingFace ▷ #today-im-learning (9 messages🔥):

HuggingFace ▷ #cool-finds (1 messages):

HuggingFace ▷ #i-made-this (4 messages):

HuggingFace ▷ #NLP (8 messages🔥):

HuggingFace ▷ #diffusion-discussions (8 messages🔥):

Cohere ▷ #discussions (38 messages🔥):

Cohere ▷ #questions (36 messages🔥):

Cohere ▷ #api-discussions (46 messages🔥):

Cohere ▷ #projects (29 messages🔥):

aider (Paul Gauthier) ▷ #general (70 messages🔥🔥):

aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):

aider (Paul Gauthier) ▷ #links (7 messages):

Interconnects (Nathan Lambert) ▷ #news (44 messages🔥):

Interconnects (Nathan Lambert) ▷ #ml-questions (27 messages🔥):

Interconnects (Nathan Lambert) ▷ #ml-drama (16 messages🔥):

Interconnects (Nathan Lambert) ▷ #memes (16 messages🔥):

Interconnects (Nathan Lambert) ▷ #reads (3 messages):

Interconnects (Nathan Lambert) ▷ #posts (15 messages🔥):

Perplexity AI ▷ #general (100 messages🔥🔥):

Perplexity AI ▷ #sharing (12 messages🔥):

Perplexity AI ▷ #pplx-api (4 messages):

Stability.ai (Stable Diffusion) ▷ #general-chat (108 messages🔥🔥):

Eleuther ▷ #general (86 messages🔥🔥):

Eleuther ▷ #research (7 messages):

Eleuther ▷ #scaling-laws (6 messages):

Eleuther ▷ #lm-thunderdome (3 messages):

Eleuther ▷ #multimodal-general (1 messages):

GPU MODE ▷ #general (7 messages):

GPU MODE ▷ #triton (4 messages):

GPU MODE ▷ #torch (1 messages):

GPU MODE ▷ #beginner (1 messages):

GPU MODE ▷ #pmpp-book (3 messages):

GPU MODE ▷ #torchao (39 messages🔥):

GPU MODE ▷ #off-topic (1 messages):

GPU MODE ▷ #llmdotc (15 messages🔥):

GPU MODE ▷ #rocm (2 messages):

GPU MODE ▷ #bitnet (1 messages):

GPU MODE ▷ #webgpu (2 messages):

GPU MODE ▷ #liger-kernel (2 messages):

GPU MODE ▷ #metal (2 messages):

OpenAI ▷ #ai-discussions (75 messages🔥🔥):

OpenAI ▷ #prompt-engineering (2 messages):

OpenAI ▷ #api-discussions (2 messages):

Nous Research AI ▷ #general (67 messages🔥🔥):

Nous Research AI ▷ #ask-about-llms (3 messages):

Nous Research AI ▷ #research-papers (4 messages):

Nous Research AI ▷ #interesting-links (1 messages):

Nous Research AI ▷ #research-papers (4 messages):

OpenRouter (Alex Atallah) ▷ #general (71 messages🔥🔥):

LlamaIndex ▷ #blog (4 messages):

LlamaIndex ▷ #general (45 messages🔥):

Latent Space ▷ #ai-general-chat (39 messages🔥):

Latent Space ▷ #ai-announcements (1 messages):

Modular (Mojo 🔥) ▷ #general (2 messages):

Modular (Mojo 🔥) ▷ #mojo (24 messages🔥):

Modular (Mojo 🔥) ▷ #max (5 messages):

LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (16 messages🔥):

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (10 messages🔥):

OpenAccess AI Collective (axolotl) ▷ #runpod-help (26 messages🔥):

Torchtune ▷ #general (9 messages🔥):

Torchtune ▷ #dev (1 messages):

Torchtune ▷ #papers (3 messages):

LangChain AI ▷ #general (9 messages🔥):

OpenInterpreter ▷ #general (8 messages🔥):

OpenInterpreter ▷ #ai-content (1 messages):

tinygrad (George Hotz) ▷ #general (3 messages):

tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):

LAION ▷ #general (1 messages):

LAION ▷ #research (3 messages):

DSPy ▷ #show-and-tell (3 messages):