Nature image datasets, with millions of photos, aid ecologists in studying behaviors and responses to climate change. Multimodal vision language models can improve image retrieval for researchers, but need more domain-specific training data for complex...
MIT faculty and alumni named 2024 AI2050 Fellows by Schmidt Futures to tackle hard AI problems. David Autor and Sara Beery among honorees for innovative AI research...
CV VideoPlayer, a Python package for computer vision research, simplifies video visualization and debugging with interactive features. It allows easy customization of overlays and frame edits, enhancing the development process for...
Pixtral 12B, Mistral AI's cutting-edge vision language model, excels in text-only and multimodal tasks, outperforming other models. It features a novel architecture with a 400-million-parameter vision encoder and a 12-billion-parameter transformer decoder, offering high performance and speed for understanding images and...
Learn to chat with images using Llama 3.2-Vision, a cutting-edge multimodal LLM by Meta. Explore its OCR and reasoning skills on Colab notebook for local...
Nobel-winning economist Daron Acemoglu examines AI's impact on economic growth and productivity, estimating a modest increase of 1.1 to 1.6 percent in GDP over the next decade. Research shows about 20-23% of U.S. job tasks could be automated with AI, with potential cost savings of...
Syngenta and AWS collaborated to develop Cropwise AI, powered by Amazon Bedrock Agents, to streamline seed selection for farmers and sales reps. Generative AI transforms the decision-making process, offering personalized recommendations at scale for a more efficient and precise selection...
Multimodal embeddings merge text and image data into a single model, enabling cross-modal applications like image captioning and content moderation. CLIP aligns text and image representations for 0-shot image classification, showcasing the power of shared embedding...
Meta Llama 3.1 LLMs with 8B and 70B inference support now on AWS Trainium and Inferentia instances. SageMaker JumpStart offers secure deployment of pre-trained models for customization and...
A solution using AWS generative AI like Amazon Bedrock and OpenSearch simplifies vehicle damage appraisals for insurers, repair shops, and fleet managers. By converting image and metadata to numerical vectors, this approach streamlines the process and provides valuable insights for informed decision-making in the automotive...
Histogram of Oriented Gradients (HOG) is a key feature extraction algorithm for object detection and recognition tasks, utilizing gradient magnitude and orientation to create meaningful histograms. The HOG algorithm involves calculating gradient images, creating histograms of gradients, and normalizing to reduce lighting...
Customized model monitoring with Amazon SageMaker is crucial for real-time AI/ML scenarios. SageMaker Model Monitor offers advanced capabilities for monitoring model quality and handling multi-payload requests, accelerating customized model monitoring...
Engage in Relational Deep Learning (RDL) by directly training on your relational database, transforming tables into a graph for efficient ML tasks. RDL eliminates feature engineering steps by learning from raw relational data, enhancing model performance and...
MIT researchers propose Diffusion Forcing, a new training technique that combines next-token and full-sequence diffusion models for flexible, reliable sequence generation. This method enhances AI decision-making, improves video quality, and aids robots in completing tasks by predicting future steps with varying noise...
Florence-2 by Microsoft, a compact Vision-Language Model, excels in image annotation tasks with zero-shot capabilities. Pre-trained on FLD-5B, it supports tasks like captioning, object detection, segmentation, and OCR in a single...
AI models like ChatGPT are ubiquitous and beneficial, but Generative AI poses challenges with misinformation and ethical concerns. Hype around AI, exemplified by NVIDIA's stock surge, raises questions about its societal impact and potential...
Meta's Data for Good program is open-sourcing AI-powered population maps on GitHub, aiding climate adaptation and disaster response projects worldwide. By providing training data and code, Meta hopes to improve global disaster preparedness and climate adaptation efforts through accurate population...
Training computer vision models with Ultralytics' YOLOv8 is now easier using Python, CLI, or Google Colab. YOLOv8 is known for accuracy, speed, and flexibility, offering local-based or cloud-based training options, such as Google Colab for enhanced computation...
MIT engineers have developed Clio, a method enabling robots to make intuitive, task-relevant decisions by identifying and remembering only relevant elements in a scene. Clio's capabilities, showcased in real experiments, could be crucial for search and rescue missions, domestic robots, and factory automation, according to...
In 1994, Diana Duyser auctioned a grilled cheese with the Virgin Mary's image for $28,000. MIT's study on pareidolia reveals human-machine perception differences and a possible evolutionary link to survival...
Northpower, a major infrastructure contractor in New Zealand, utilizes AI to prioritize public safety risks, reducing effort and carbon emissions. Facing challenges in inspecting power poles for safety, Northpower combines digital and scanned data to efficiently identify and address potential...
Tesla and others face challenges infusing robots with AI. Boston Dynamics' Atlas robot raises hopes for a multipurpose domestic...
Meta and Waymo introduce Transfusion model combining transformer and diffusion for multi-modal prediction. Transfusion model uses bi-directional transformer attention for image tokens and pre-training tasks for text and...
Cohere Rerank 3 Nimble FM enhances enterprise search systems, improving speed and accuracy by reordering relevant documents efficiently. Amazon SageMaker JumpStart provides access to pre-trained models like Cohere Rerank 3 Nimble, enabling customization for specific use cases without starting from...
Decoding ML job roles is key to interview success. Understanding spectrum of roles can refine strategy and boost...
Integrating Batch Normalization in a ViT architecture reduces training and inference times by over 60%, maintaining or improving accuracy. The modification involves replacing Layer Normalization with Batch Normalization in the encoder-only transformer...
MIT CSAIL researchers developed RialTo, a system that creates digital twins for training robots in specific environments faster and more effectively. RialTo improved robot performance by 67% in various tasks, handling disturbances and distractions with...
Amazon Forecast, launched in 2019, offers accurate time series forecasts. SageMaker Canvas provides faster model building, cost-effective predictions, and enhanced transparency for ML models, including time series...
NVIDIA unveils generative physical AI advancements at SIGGRAPH, including NIM microservices for building interactive visual AI agents and training physical machines. The technology transforms industries like manufacturing and healthcare, enabling robots and automation to navigate their surroundings more...
Llama 3.1's multilingual LLMs, available on Amazon SageMaker JumpStart, offer optimized generative AI models for developers and businesses. SageMaker JumpStart provides access to pre-trained foundation models, allowing for customization and secure deployment in a dedicated VPC...
Satellite imagery enhances monitoring of Earth's changes, but cloud segmentation is crucial. Algorithms like Random Forest and YOLO are compared for cloud removal in Sentinel-2 images. Access data through Copernicus Open Access Hub, Google Earth Engine, or Python package...
Foundation models, like Large Language Models (LLMs), are being adapted for time series modeling through Large Time Series Foundation Models (LTSM). By leveraging sequential data similarities, LTSM aims to learn from diverse time series data for tasks like outlier detection and classification, building on the success of LLMs in computational linguistic...
TDS celebrates milestone with engaging articles on cutting-edge computer vision and object detection techniques. Highlights include object counting in videos, AI player tracking in ice hockey, and a crash course on autonomous driving...
MusGConv introduces a perception-inspired graph convolution block for processing music score data, improving efficiency and performance in music understanding tasks. Traditional MIR approaches are enhanced by MusGConv, which models musical scores as graphs to capture complex, multi-dimensional music...
PyTorch 2.0 introduced torch.compile for faster code execution. AWS optimized torch.compile for Graviton3 processors, resulting in significant performance improvements for NLP, CV, and recommendation...
Yann LeCun's 1989 breakthrough with Convolutional Neural Networks preserved spatial image data, revolutionizing Computer Vision research. CNNs use filters to extract feature maps, stacking layers to create powerful image...
Transformers, known for revolutionizing NLP, now excel in computer vision tasks. Explore the Vision Transformer and Masked Autoencoder Vision Transformer architectures enabling this...
MIT and Meta researchers develop PlatoNeRF, a computer vision technique using shadows and machine learning to create accurate 3D models of scenes, improving autonomous vehicles and AR/VR efficiency. Combining lidar and AI, PlatoNeRF offers new opportunities for reconstructions and will be presented at the Conference on Computer Vision and Pattern...
MIT researchers found that large language models can understand the visual world and generate complex scenes. By querying LLMs to self-correct code for images, they improved simple drawings and trained a vision system without using visual...
Sprinklr utilizes AI to enhance customer experience, achieving 20% throughput improvement with AWS Graviton3 for cost-effective ML inference. Thousands of servers fine-tune and serve over 750 AI models across 60+ verticals, processing 10 billion predictions...
Scientists are using AI to identify advanced materials for solar cells. MIT engineers develop a computer vision technique to speed up material characterization by 85 times, aiming for fully automated materials...
Choosing the right AI use case is crucial for success. AI can be valuable even with moderate performance, offering unique solutions. Examples include Sensor Fusion and Generative AI in everyday...
Scientists at MIT and the MIT-IBM Watson AI Lab have developed a new approach to teach computers to pinpoint actions in videos using only transcripts. This method, called spatio-temporal grounding, improves accuracy in identifying actions in longer videos and could have applications in online learning and...
Discover the groundbreaking research by XYZ Company on the development of a new AI technology that can revolutionize the healthcare industry. Learn how this innovation is set to improve patient care and diagnosis...
New study reveals groundbreaking AI technology developed by Google, revolutionizing data analysis in healthcare. Findings show significant increase in accuracy and efficiency of diagnosing rare...
Discover the groundbreaking collaboration between Tesla and SpaceX, revolutionizing electric vehicles and space travel. Explore how their innovative technologies are shaping the future of...
Discover how Tesla's new self-driving technology is revolutionizing the automotive industry. With advanced AI algorithms and cutting-edge sensors, Tesla is paving the way for autonomous...
Discover how Company X revolutionized the tech industry with its groundbreaking AI technology. Find out how their product has disrupted traditional business models and set new standards for...
Discover how innovative startups are revolutionizing the tech industry with cutting-edge products. From AI-powered solutions to sustainable technologies, these companies are reshaping the...
Discover the groundbreaking collaboration between Tesla and SpaceX in developing innovative renewable energy solutions. Explore how Elon Musk's vision is revolutionizing the future of transportation and space...
Discover the latest groundbreaking research on AI technology by leading companies like Google and IBM. Learn about the potential impact on various industries and the future of artificial...
3D Gaussian splatting, a new method for novel view synthesis, is challenging NeRFs as the predominant technique for 3D scene representation. This technique utilizes anisotropic Gaussians to render crisp 3D models in real-time, providing a unique approach to scene representation and image...
Recent advancements in AI, including GenAI and LLMs, are revolutionizing industries with enhanced productivity and capabilities. Vision transformer architectures like ViTs are reshaping computer vision, offering superior performance and scalability compared to traditional...
Access Sun RGB-D dataset for 3D understanding from 2D images. Dataset includes indoor scenes with 2D and 3D annotations from various 3D scanners. Explore Python code to access this valuable resource for deeper ML...
MIT researchers developed a dataset to simulate peripheral vision in AI models, improving object detection. Understanding peripheral vision in machines could enhance driver safety and predict human behavior, bridging the gap between AI and human...
Article highlights deploying ML models in the cloud, combining CS and DS fields, and overcoming memory limitations in model deployment. Key technologies include Detectron2, Django, Docker, Celery, Heroku, and AWS...
This article discusses the importance of high-quality data and reducing labeling errors in pose estimation models. It demonstrates how a custom labeling workflow in Amazon SageMaker Ground Truth can streamline the labeling process and minimize errors, ultimately reducing the cost of obtaining accurate pose...
Automate mortgage document fraud detection using ML models and business-defined rules with Amazon Fraud Detector, a fully managed fraud detection service. Upload historical data, train the model, review performance, and deploy the API to make predictions for improved fraud detection and underwriting...
Automate detecting document tampering and fraud at scale using AWS AI and machine learning services for mortgage underwriting. Develop a deep learning-based computer vision model to detect and highlight forged images in mortgage underwriting using Amazon...
AI technology has the ability to transform food images into recipes, allowing for personalized food recommendations, cultural customization, and automated cooking execution. This innovative method combines computer vision and natural language processing to generate comprehensive recipes from food images, bridging the gap between visual depictions of dishes and symbolic...
MIT's Improbable AI Lab has developed a multimodal framework called HiP, which uses three different foundation models to help robots create detailed plans for complex tasks. Unlike other models, HiP does not require access to paired vision, language, and action data, making it more cost-effective and...
This article explores monocular depth estimation (MDE) and its importance in computer vision applications. It provides a walkthrough on loading and visualizing depth map data, running inference with Marigold and DPT, and evaluating depth predictions using the SUN RGB-D...
The article explores the use of lightweight hierarchical vision transformers in autonomous robotics, highlighting the effectiveness of a shared trunk concept for multi-task learning. It also discusses the emergence of large multimodal models and their potential to create a unified architecture for end-to-end autonomous driving...
Computer vision has evolved from small pixelated images to generating high-resolution images from descriptions, with smaller models improving performance in areas like smartphone photography and autonomous vehicles. The ResNet model has dominated computer vision for nearly eight years, but challengers like Vision Transformer (ViT) are emerging, showing state-of-the-art performance in computer...
The PGA TOUR is developing a next-generation ball position tracking system using computer vision and machine learning techniques to locate golf balls on the putting green. The system, designed by the Amazon Generative AI Innovation Center, successfully tracks the ball's position and predicts its resting...
2024 could be the tipping point for Music AI, with breakthroughs in text-to-music generation, music search, and chatbots. However, the field still lags behind Speech AI, and advancements in flexible and natural source separation are needed to revolutionize music interaction through...
Gaussian splatting is a fast and interpretable method for representing 3D scenes without neural networks, gaining popularity in a world obsessed with AI models. It uses 3D points with unique parameters to closely match renders to known dataset images, offering a refreshing alternative to complex and opaque methods like...
Autonomous machines in robotics showcased their capabilities in 2023, with notable mentions including Glüxkind's AI-powered smart stroller, Soft Robotics' mGripAI system for food packing, and Quanta's TM25S robot for product inspection, all utilizing NVIDIA...
ICL, a multinational manufacturing and mining corporation, developed in-house capabilities using machine learning and computer vision to automatically monitor their mining equipment. With support from the AWS Prototyping program, they were able to build a framework on AWS using Amazon SageMaker to extract vision from 30 cameras, with the potential to scale to...