BAFLineDP: Code Bilinear Attention Fusion Framework for Line-Level Defect Prediction
The paper presents a line-level defect prediction method grounded in a code bilinear attention fusion framework (BAFN).
BAFN is designed to amalgamate global and local information by capturing the bilinear interaction attention weights between code lines and their respective line-level contextual information to construct defect code features.
github: https://github.com/insoft-lab/BAFLineDP
The paper presents a line-level defect prediction method grounded in a code bilinear attention fusion framework (BAFN).
BAFN is designed to amalgamate global and local information by capturing the bilinear interaction attention weights between code lines and their respective line-level contextual information to construct defect code features.
github: https://github.com/insoft-lab/BAFLineDP
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem
The paper presents a vision for the future of computing within the AIOS-Agent ecosystem, where the LLM functions as the core of AIOS (Artificial Intelligent Operating System).
The paper presents a vision for the future of computing within the AIOS-Agent ecosystem, where the LLM functions as the core of AIOS (Artificial Intelligent Operating System).
π2
Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch
Recently researchers proposed DoRA: Weight-Decomposed Low-Rank Adaptation. DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding
To understand how these methods work Sebastian Raschka in his article suggests implementing both LoRA and DoRA in PyTorch from scratch.
Recently researchers proposed DoRA: Weight-Decomposed Low-Rank Adaptation. DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding
To understand how these methods work Sebastian Raschka in his article suggests implementing both LoRA and DoRA in PyTorch from scratch.
π3π₯2
Context-Aware Code Generation
In practice, you rarely need to write code taken out of context. As a rule, we write code that becomes part of the project. This code is closely related to the project ideologically, syntactically and stylistically. How to ensure this when generating code using LLM? If the project is small, then the entire project code can be passed as context. For a large project, this trick will not work, and such approaches as RAG (this one or this one) are required to fetch relevant information from existing code repositories and use it to create accurate code, documentation, or even fix code errors.
In practice, you rarely need to write code taken out of context. As a rule, we write code that becomes part of the project. This code is closely related to the project ideologically, syntactically and stylistically. How to ensure this when generating code using LLM? If the project is small, then the entire project code can be passed as context. For a large project, this trick will not work, and such approaches as RAG (this one or this one) are required to fetch relevant information from existing code repositories and use it to create accurate code, documentation, or even fix code errors.
π₯2
New Breakthrough Brings Matrix Multiplication Closer to Ideal
In 1969 a breakthrough result by Strassen showed that n x n matrices can be multiplied faster than the naive cubic time algorithm. Since then there has been an explosion of results obtaining lower and lower bounds on the exponent Ο defined as the smallest constant such that for all Ξ΅ > 0, n x n matrices can be multiplied using O(n^{Ο+Ξ΅}) arithmetic operations.
The new bound on Ο is Ο β€ 2.371552 .
paper: https://epubs.siam.org/doi/10.1137/1.9781611977912.134
In 1969 a breakthrough result by Strassen showed that n x n matrices can be multiplied faster than the naive cubic time algorithm. Since then there has been an explosion of results obtaining lower and lower bounds on the exponent Ο defined as the smallest constant such that for all Ξ΅ > 0, n x n matrices can be multiplied using O(n^{Ο+Ξ΅}) arithmetic operations.
The new bound on Ο is Ο β€ 2.371552 .
paper: https://epubs.siam.org/doi/10.1137/1.9781611977912.134
π₯4
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLM
The authors conducted a largescale human evaluation of HumanEval and MBPP, two popular benchmarks for Python code generation, analyzing their diversity and difficulty. There is a critical bias towards a limited set of programming concepts.
To address limitations, the authors propose a novel benchmark, PythonSaga, featuring 185 hand-crafted prompts on a balanced representation of 38 programming concepts across diverse difficulty levels.
The authors conducted a largescale human evaluation of HumanEval and MBPP, two popular benchmarks for Python code generation, analyzing their diversity and difficulty. There is a critical bias towards a limited set of programming concepts.
To address limitations, the authors propose a novel benchmark, PythonSaga, featuring 185 hand-crafted prompts on a balanced representation of 38 programming concepts across diverse difficulty levels.
π2
Introducing Devin, the first AI software engineer
Devin is equipped with common developer tools including the shell, code editor, and browser within a sandboxed compute environmentβeverything a human would need to do their work. Devin can actively collaborate with the user. Devin reports on its progress in real time, accepts feedback.
Devin correctly resolves 13.86%* of the issues end-to-end (SWE-bench), exceeding the previous state-of-the-art of 1.96%.
To hire Devin for engineering work: waitlist.
Devin is equipped with common developer tools including the shell, code editor, and browser within a sandboxed compute environmentβeverything a human would need to do their work. Devin can actively collaborate with the user. Devin reports on its progress in real time, accepts feedback.
Devin correctly resolves 13.86%* of the issues end-to-end (SWE-bench), exceeding the previous state-of-the-art of 1.96%.
To hire Devin for engineering work: waitlist.
π4
CAM: A Collection of Snapshots of GitHub Java Repositories Together with Metrics
CAM (Classes and Metrics) is an open-source software capable of cloning Java repositories from GitHub, filtering out unnecessary files, parsing Java classes, and computing metrics such as Cyclomatic Complexity, Halstead Effort and Volume, C&K metrics, Maintainability Metrics, LCOM5 and HND, as well as some Git-based Metrics.
The latest archive of 2.2Gb is published on Amazon S3 and includes 532K Java classes with 48 metrics for each class.
github
CAM (Classes and Metrics) is an open-source software capable of cloning Java repositories from GitHub, filtering out unnecessary files, parsing Java classes, and computing metrics such as Cyclomatic Complexity, Halstead Effort and Volume, C&K metrics, Maintainability Metrics, LCOM5 and HND, as well as some Git-based Metrics.
The latest archive of 2.2Gb is published on Amazon S3 and includes 532K Java classes with 48 metrics for each class.
github
π4
DevBench: A Comprehensive Benchmark for Software Development
DevBench is a benchmark designed to evaluate LLMs across various stages of the software development lifecycle, including software design, environment setup, implementation, acceptance testing, and unit testing. By integrating these interconnected steps under a single framework, DevBench offers a holistic perspective on the potential of LLMs for automated software development.
The DevBench dataset comprises 22 curated repositories across 4 programming languages (Python, C/C++, Java, JavaScript), covering a wide range of domains such as machine learning, databases, web services, and command-line utilities.
github
DevBench is a benchmark designed to evaluate LLMs across various stages of the software development lifecycle, including software design, environment setup, implementation, acceptance testing, and unit testing. By integrating these interconnected steps under a single framework, DevBench offers a holistic perspective on the potential of LLMs for automated software development.
The DevBench dataset comprises 22 curated repositories across 4 programming languages (Python, C/C++, Java, JavaScript), covering a wide range of domains such as machine learning, databases, web services, and command-line utilities.
github
π2
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
LongRoPE is a method that extends the context length of LLMs to 2048k, while maintaining their capabilities within original shorter context window.
Code will be available at https://github.com/microsoft/
LongRoPE is a method that extends the context length of LLMs to 2048k, while maintaining their capabilities within original shorter context window.
Code will be available at https://github.com/microsoft/
π4
Open Release of Grok-1
xAI is releasing the base model weights and network architecture of Grok-1. Grok-1 is a 314 billion parameter MoE model trained from scratch by xAI. The released checkpoint is the raw base model from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.
The weights and the architecture are released under the Apache 2.0 license.
JAX example code for loading and running the Grok-1 open-weights model: https://github.com/xai-org/grok-1
xAI is releasing the base model weights and network architecture of Grok-1. Grok-1 is a 314 billion parameter MoE model trained from scratch by xAI. The released checkpoint is the raw base model from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.
The weights and the architecture are released under the Apache 2.0 license.
JAX example code for loading and running the Grok-1 open-weights model: https://github.com/xai-org/grok-1
π1
LLM4Decompile: Decompiling Binary Code with Large Language Models
The authors released open-access decompilation LLMs ranging from 1B to 33B pre-trained on 4 billion tokens of C source code and the corresponding assembly code. Experiments indicate that LLM4Decompile has demonstrated the capability to accurately decompile 2% of the assembly code, which achieves a 50% improvement over GPT-4.
Code, dataset, and models are released at https://github.com/albertan017/LLM4Decompile
The authors released open-access decompilation LLMs ranging from 1B to 33B pre-trained on 4 billion tokens of C source code and the corresponding assembly code. Experiments indicate that LLM4Decompile has demonstrated the capability to accurately decompile 2% of the assembly code, which achieves a 50% improvement over GPT-4.
Code, dataset, and models are released at https://github.com/albertan017/LLM4Decompile
π₯2π1