Comparing Llama Models for Python Function Generation
Testing two models, Llama 3.1 (8B) and Llama 3.2 (3.2B), on their ability to generate a Python function that computes the height of an object using its image size in pixels and the object’s distance.
Experimental Setup
Both models were run locally on a PC equipped with:
* CPU: Intel i7-8700
* RAM: 16GB
* GPU: Nvidia Quadro P2200
The models operated in generation-only mode, with no feedback or fine-tuning. Each attempt to solve the task involved the following process:
1. Generating Code: The task prompt was structured as a docstring:
3. Testing the Function: The function was validated with:
* Default parameters.
* Various valid inputs.
* Invalid parameters to test error handling.
The task was considered successfully solved only if the function passed all tests.
Evaluation Metrics
To measure the effectiveness of each model, three key metrics were considered:
* Number of Attempts: how many attempts were required to produce a correct solution.
* Time per Attempt: the time taken to generate and process the function code in each iteration.
* Total Time to Solve the Task: the cumulative time from the start of the process until a correct solution was achieved.
Testing two models, Llama 3.1 (8B) and Llama 3.2 (3.2B), on their ability to generate a Python function that computes the height of an object using its image size in pixels and the object’s distance.
Experimental Setup
Both models were run locally on a PC equipped with:
* CPU: Intel i7-8700
* RAM: 16GB
* GPU: Nvidia Quadro P2200
The models operated in generation-only mode, with no feedback or fine-tuning. Each attempt to solve the task involved the following process:
1. Generating Code: The task prompt was structured as a docstring:
Create a Python function `object_height` that calculates object height2. Running the Code: The function code was executed locally.
based on the image height in pixels.
Input parameters:
- focal length `F` in mm, default 35 mm
- image height `H` in pixels, default 1152
- pixel size `p` in μm, default 3.45 μm
- object distance `D`, default 6.5 m
Returns:
The object height in mm as an absolute float value.
Raises:
`RuntimeError` if input parameters are invalid (negative or zero).
Extracting the Function Code: The generated response was parsed to isolate the function implementation.
3. Testing the Function: The function was validated with:
* Default parameters.
* Various valid inputs.
* Invalid parameters to test error handling.
The task was considered successfully solved only if the function passed all tests.
Evaluation Metrics
To measure the effectiveness of each model, three key metrics were considered:
* Number of Attempts: how many attempts were required to produce a correct solution.
* Time per Attempt: the time taken to generate and process the function code in each iteration.
* Total Time to Solve the Task: the cumulative time from the start of the process until a correct solution was achieved.
🔥1