Sports2D: Compute 2D human pose and angles from a video or a webcam

Pagnon, David; Kim, HunMin

doi:10.5281/zenodo.17595137

Published November 13, 2025 | Version v0.8.25

Software Open

Sports2D: Compute 2D human pose and angles from a video or a webcam

Depth perspective effects are now compensated. They make the farther limb look smaller. This represents 1-2% coordinate error at 10 m, more if the camera is closer.

Pixel to meter coordinates got improved, taking into account:

the pixel to meter scale
the camera horizon angle
the floor height
the perspective effects with an additional depth parameter <-- this is new

Added a flexible configuration argument for the user to choose which depth information to use. Either:

Distance from camera to the lane: distance_m
Focal length in pixels: distance_m = f_px * H/h
Field of view in degrees or radians: distance_m = max(W,H)/2 / tan(fov/2)
Calibration file: distance_m = K[0,0] * H/h

In the same way, the camera horizon angle and the floor height can be specified with:

a manual input
a calibration file
automatically from gait

[px_to_meters_conversion] # Config.toml

# Compensate for perspective effects, which make the further limb look smaller. 1-2% coordinate error at 10 m, less if the camera is further away
perspective_value = 10 # Either camera-to-person distance (m), or focal length (px), or field-of-view (degrees or radians), or '' if perspective_unit=='from_calib'
perspective_unit = 'distance_m' # 'distance_m', 'f_px', 'fov_deg', 'fov_rad', or 'from_calib'

# Compensate for camera horizon 
floor_angle = 'auto' # float, 'from_kinematics', 'from_calib', or 'auto' # 'auto' is equivalent to 'from_kinematics', ie angle calculated from foot contacts. 'from_calib' calculates it from a toml calibration file. Use float to manually specify it in degrees
xy_origin = ['auto'] # [px_x,px_y], or ['from kinematics'], ['from_calib'], or ['auto']. # BETWEEN BRACKETS! # ['auto'] is equivalent to ['from_kinematics'], ie origin estimated at first foot contact, direction is direction of motion. ['from_calib'] calculates it from a calibration file. Use [px_x,px_y] to manually specify it in pixels (px_y points downwards)

# Optional calibration file
calib_file = ''      # Calibration file in the Pose2Sim toml format, or '' if not available

Note: If the user does not want perspective effects to be taken into effect, they can set distance_m to a very large value, such as 10000m for example.

Full Changelog: https://github.com/davidpagnon/Sports2D/compare/v0.8.24...v0.8.25

More about pixel-to-meter conversion

Pixel to meters scale

Let's start with the pinhole camera model.
The intercept theorem tells us:
distance_m / f_px = Y / y (1) With:

distance_m: the distance between the camera origin and the athlete in meters
f_px: the focal length (the distance between the camera origin and the sensor), converted from mm to pixels
Y: The coordinate of a point in the scene in meters
y: The coordinate of a point on the camera sensor in pixels

A particular case of it is the coordinates of the athlete:
distance_m / f_px = H / h (2) With:

H: the height of the athlete in meters
h: the height of the athlete on the camera sensor in pixels

Now, the image coordinates are generally not taken from the center of the image / sensor, but from its top left corner (see image), which means that:

x = u - cu 							(3)  
y = - (v - cv)

With:

u,v: the image coordinates
cu, cv: the coordinates of the principal point of the sensor, approximated as the image center

So we end up with all these relations:
distance_m / f_px = H / h = X/(u-cu) = -Y/(v-cv) (4)

And the simplest case is resolved:

X = H / h * (u-cu) 							(5)
Y = - H / h * (v-cv)

<table> <thead style="text-align: left;"> <tr> <th scope="col" style="text-align: left;">Person height calculation:</th> </tr> </thead> <tbody> <tr> <td scope="row" style="text-align: left;">I calculate the height in pixels from the following distances: <code>height = (rfoot+lfoot)/2 + (rshank+lshank)/2 + (rfemur+lfemur)/2 + (rback+lback)/2 + head</code>, with:<br> <ul> <li>Foot: distance from heel to ankle (or 10 cm if the pose does not provide any heel point) </li> <li>Shank: distance from ankle to knee </li> <li>Femur: distance from knee to hip </li> <li>Back: distance from hip to shoulder </li> <li>Head: distance from midshoulder to top head point (or distance from midshoulder to nose*1.33 if the pose model does not provide any top head point) </li> </ul> Not all frames are good, therefore: <ul> <li>I first remove the 20% fastest frames (potential outliers), the frames where speed is close to zero (person might be out of frame), and the frames where the hip and knee angles are below 45° (coordinates are imprecise when the person is crouching). </li> <li>And I take the trimmed mean over the remaining frames, after removing the 20% most extreme values. </li> </ul> </td> </tr> </table>

Compensation for the camera horizon

The camera is not always set perfectly horizontally, and we may want to compensate for it. After evaluating the angle from gait kinematics or from a calibration file, we can change the coordinate system:

xang = x*cos(ang) + y*sin(ang) 				(6)  
yang = y*cos(ang) - y*sin(ang)

With ang the camera horizon angle.

Reinjecting this in the previous formula gives:

X = H / h * ((u-cu)*cos(ang) + (v-cv)*sin(ang)) 		(7)  
Y = - H / h * ((v-cv)*cos(ang) - (u-cu)* sin(ang))

Moreover, we want the floor to be situated at Y = 0 so that feet are in contact with the floor. Instead of considering that the pixel origin is at the center of the image, we determine (cx,cy) as the intersection between the left border of the image and the floor line, determined from kinematics. We can simply replace (cu,cv) by (cx,cy) in the previous formula:

X = H / h * ((u-cx)*cos(ang) + (v-cy)*sin(ang))		(8)  
Y = - H / h * ((v-cy)*cos(ang) - (u-cx)* sin(ang))

<table> <thead style="text-align: left;"> <tr> <th scope="col" style="text-align: left;">Determination of the camera horizon and floor height from kinematics:</th> </tr> </thead> <tbody> <tr> <td scope="row" style="text-align: left;">The floor line (origin and angle) is estimated from the line that fits foot ground contacts. <br>Ground contacts are estimated as the coordinates where the feet's horizontal velocities are close to zero (default: 7 px/s). Points with low confidence are removed (default: 0.3). The output of the fit is (slope, intercept). We obtain: <ul> <li>ang = -arctan(slope) </li> <li>origin = (0,intercept) = (cu,cv), ie the intersection between the left border and the floor line </li> <li>gait_direction: left-to-right if ang>0, and right-to-left otherwise. </li> </ul> <img width="1173" height="349" alt="image" src="https://github.com/user-attachments/assets/61f340bc-a7ce-4efe-86ef-acf086427728" /> </td> </tr> </table>

Compensation for depth perspective

The person's left and right limbs are not situated at the same depth. Due to perspective, the further limb can look smaller, especially if the camera is close to the athlete. We can compensate for this effect.

We can extract from Equation (4):

distance_m / f_px = -Y/(v-cv) (9)

Adding in the depth offset, we get:

(distance_m + depth_offset) / f_px = X/(v-cv) (10)

With depth_offset the offset of the joint with regard to the body midline, in meters.

This equation can be reorganized to separate the coordinates at the body midline from their offsets due to depth:
Y = distance_m / f_px * (v-cv) + depth_offset / f_px * (v-cv) (11)

Now, there is a catch: we want the floor to be situated at Y=0, so (v-cv) should be replaced by (v-cy) in the first part of the equation. On the other hand, we want the depth offset to be null at the center of the image (and larger when getting further), so the second part of the equation should not be changed (see image). So we obtain:

Y = distance_m / f_px * (v-**cx**) + depth_offset / f_px * (v-cv) (12)

Equation (2) tells us that distance_m / f_px = H / h, so we can rearrange it:

X = H/h * \[ (u-cx) + depth_offset / distance_m * (u-cu)\] (13)

Finally, when taking the angles into consideration, the final formula becomes longer, although not more complex:

<table> <thead style="text-align: left;"> <tr> <th style="text-align: left;">Final formula:</th> </tr> </thead> <tbody> <tr> <td scope="row" style="text-align: left;">The floor line (origin and angle) is estimated from the line that fits foot ground contacts. <br>Final formula: X = H/h * \[ (14)<br> ( (u-cx) + depth_offset / distance_m * (u-cu) ) * cos(ang) + <br> ( (v-cy) + depth_offset / distance_m * (v-cv) ) * sin(ang) <br> \] <br> Y = - H/h * \[<br> ( (v-cy) + depth_offset / distance_m * (v-cv) ) * cos(ang) -<br> ( (u-cx) + depth_offset / distance_m * (u-cu) ) * sin(ang) <br> \] </td> </tr> </table>

With:

X: the horizontal coordinate of a point in meters
Y: the vertical coordinate of a point in meters, pointing upwards
u: the horizontal image of a point coordinate in pixels
v: the vertical image of a point coordinate in pixels, pointing downwards
cx: the image coordinate of the left border, ie 0 px
cy: the intersection between the left border and the floor line in pixels
cu: the horizontal center of the image in pixels, ie width/2
cv: the vertical center of the image in pixels, ie height/2
H: the height of the athlete in meters
h: the height of the athlete on the camera sensor in pixels
depth_offset: the offset of the joint with regards to the body midline in meters
ang: the camera horizon angle
distance_m: the distance between the camera origin and the athlete in meters
f_px: the focal length (the distance between the camera origin and the sensor), converted from mm to pixels
Y: The coordinate of a point in the scene in meters
y: The coordinate of a point on the camera sensor in pixels

Note: I used canonical values for the joint-to-body midline distances, which range from about 10 cm (knees) to 20 cm (shoulders). This is a gross estimation since depths obviously evolve during the sprint, but we do 2D pose estimation, so that's the best we can have.

Corollary: If the camera is at a 10 m distance, the depth_offset/distance_m ratio is 1 to 2%. The (c-cv) factor ensures that the offset is nullified at the center of the image.

Calibration file loading and generation

Regardless of the user's choice, a calibration file is generated:

The intrinsic matrix is written as:

[f,              0,     cu           ]  
[0,             f,      cv           ]  
[0,             0,     1             ]

The rotation Rodrigues vector is calculated as:

[cos(ang), 0,      sin(ang) ]  
[0,             1,      0            ]  
[-sin(ang), 0,      cos(ang)]

The translation vector is calculated as: [-H/h*(cx-cu), H/h*f_px, H/h*(cy-cv)]
The distortions are assumed to be nonexistent

Note that the actual formulas for rotation and translation as a bit more complex, as their vectors need to be converted back and forth from the world to the camera's point of view.

Incidentally, producing a calibration allows for a better visualization with an overlay of the OpenSim skeleton.
<img width="851" height="229" alt="image" src="https://github.com/user-attachments/assets/3a352b0d-973e-48ed-a8ec-182e3abfaaff" />

Notes

If you use this software, please cite our article in the Journal of Open Source Software.

Files

davidpagnon/Sports2D-v0.8.25.zip

Files (9.9 MB)

Name	Size	Download all
davidpagnon/Sports2D-v0.8.25.zip md5:54e4493c1cda62c8a71afefce713a327	9.9 MB	Preview Download

Additional details

Is supplement to: Software: https://github.com/davidpagnon/Sports2D/tree/v0.8.25 (URL)

Repository URL: https://github.com/davidpagnon/Sports2D

	All versions	This version
Views	1,369	15
Downloads	376	1
Data volume	3.5 GB	9.9 MB

Sports2D: Compute 2D human pose and angles from a video or a webcam

Creators

Description

More about pixel-to-meter conversion

Pixel to meters scale

Compensation for the camera horizon

Compensation for depth perspective

Calibration file loading and generation

Notes

Files

davidpagnon/Sports2D-v0.8.25.zip

Files (9.9 MB)

Additional details

Related works

Software