How Colossal-AI Advanced the Speedup of Deep Learning in 2022
The year 2022 is coming to an end and Colossal-AI just got started. Together with our ecosystem of users, clients and research institutions, we accomplished unprecedented achievements both in science and software technology.
Trusted by companies around the globe such as AWS, Meta, BioMap, and Lightning AI, our flagship product Colossal-AI, the deep learning acceleration system, has gained worldwide attention and is being applied in a rapidly increasing number of scenarios. Released on GitHub back in October 2021, Colossal-AI already reached over 7,000 stars and the number of downloads grew to over 29,000.
Let’s take a look back at how far we’ve come this year.
HPC-AI Tech Completed a $6 Million Seed and Angel Round Fundraising
We were able to secure $6 million in seed and angel round fundraising. BlueRun Ventures led the angel round, and Sinovation Ventures and ZhenFund jointly led the seed round. The two rounds of fundraising were completed within one year. The funds will be mainly used to recruit excellent teams and expand the company’s business market.
Colossal-AI Features A Complete Open Source Stable Diffusion Pretraining and Fine-tuning Solution
With previous experience regarding large model acceleration, the Colossal-AI team released a complete open source Stable Diffusion pretraining and fine-tuning solution. This solution reduces the pretraining cost by 6.5 times, and the hardware cost of fine-tuning by 7 times, while simultaneously speeding up the processes. The fine-tuning task flow can be feasibly completed on an RTX 2070/3050 PC, allowing AIGC models such as Stable Diffusion to be available to those without access to extremely complicated machines.
New FastFold Version Accelerates AlphaFold Inference by 5 Times and Reduces GPU Memory by 75%
By introducing fine-grained memory management optimization, the new version of FastFold can reduce memory usage from 16 GB when inferencing protein structures of 1200 residues to 5 GB (single precision). Additionally, the new version of FastFold leverages dynamic axial parallelism and full-process parallelization for pre-processing acceleration. Thus, enhancing large model performance while significantly reducing inference time. According to our experiments, it can accelerate AlphaFold Inference by 5 times and reduce GPU memory by 75%.
xTrimo Multimer Accelerates Structure Prediction of Protein Monomers and Multimer by 11 Times
The latest solution from the Colossal-AI team and BioMap for protein monomer and multimer structure prediction, xTrimo Multimer, has recently become open source to the public. This new solution can predict monomer and multimer structure simultaneously, accelerating the process by up to 11 times! As an important application of the Colossal-AI system in the pharmaceutical industry, xTrimo Multimer can greatly increase the pace of model design and development for protein structure prediction, generating breakthroughs for large AI model applications in healthcare and bioinformatics.
Energon-AI Surpassed NVIDIA FasterTransformer’s Inference Performance by 50%
The Colossal-AI team developed the subsystem Energon-AI to provide inference service for super-scale deep learning models. With minimal adjustments to existing projects, users can easily develop large models for inference, and achieve super linear speedups for parallel extensions. Compared to NVIDIA’s FastTransformer, Energon-AI reached an improvement of 50% on parallelized inference speedups with large AI models.
Initial Release of FastFold Facilitates Drug Research and Development By Reducing Training Time From 11 Days to 67 Hours
Based on open-source AlphaFold, the Colossal-AI team and Helixon jointly introduced FastFold, which incorporates GPU optimization and large-scale model training technology into AlphaFold and was applied to the training and reasoning of protein structure prediction models. Compared to AlphaFold, FastFold successfully reduced the overall training time from 11 days to 67 hours at a lower cost. It also reached a 9.3-11.6 times acceleration in long sequence reasoning.
Sky Computing Successfully Accelerated Federated Learning with Hybrid Distributed Computing Features
Have you ever imagined AI infrastructure built at a macro level — distributing resources across continents, while simultaneously maintaining data privacy? Sky Computing can fulfill this concept. Sky Computing successfully utilizes hybrid distributed computing features to accelerate federated learning by 55% while ensuring user data privacy.
Colossal-AI Presented At Notable International Conferences Such As SuperComputing and AWS Summit
The Colossal-AI team has been accepted and invited to deliver keynote speeches at a series of notable international conferences including SuperComputing conference in Dallas, Open Data Science Conference in San Francisco, World Artificial Intelligence Conference in Shanghai, and AWS Summit in China.
HPC-AI Tech Joined NVIDIA Inception
June 2022, we announced that we joined NVIDIA Inception, a program to nurture startups revolutionizing industries using technology advancements. NVIDIA Inception will allow us to evolve faster, granting access to cutting-edge technology and NVIDIA experts, connections with venture capitalists, and co-marketing support to heighten our visibility. The program will also present an opportunity for us to collaborate with industry-leading experts and other AI-driven organizations.
We Joined New York University’s Endless Frontier Labs Program
We were chosen from 1,121 applicants to join the Endless Frontier Labs (EFL) 2022–2023 cohort Digital Tech track after a rigorous selection process by the EFL Team. New York University’s EFL program provides an opportunity for science and technology startups in their early stages to grow in partnership with the New York University Stern School of Business. This year, over 1,100 startups from 66 countries world-wide, and from 43 states in the U.S. participated in the competition. We are among only 78 startups selected as finalists for the program.
First Official Release of Colossal-AI on April 5th, 2022
The first official release of Colossal-AI in April 2022 included a brand new ZeRO module; a Profiler TensorBoard plugin of Beta version (Finer grained monitoring than PyTorch for communication, memory, etc); MOE feature and example. We also enhanced the documentation to provide in-depth tutorials.
Thank You!
Thank you so much for your trust and dedication to Colossal-AI since our inception! In the upcoming year 2023, we will continue to be committed to helping enterprises maximize the efficiency of artificial intelligence deployment while minimizing costs. We look forward to working hand in hand with you in the future. Let’s continue our joint success!
Merry Christmas and a Happy New Year!
Your Colossal-AI team
Comments