1. Executive Summary

When I started salsa dancing, I thought the biggest challenge would be learning to move my hips in that way that makes salsa dancers look so good. It never occurred to me that the most challenging aspect would be something much more basic — finding the “1”.

Casino (Cuban salsa) is based on a 1–2–3, 5–6–7 count, and knowing where the 1 falls in the song is essential. In theory, it should be simple to find, and in most music, it is. But Cuban salsa behaves differently. It is a complex, dynamic form, grounded in rhythm yet moving, at times, into moments that blur the sense of the 1. It is this balance at the edge of chaos that makes the music, and in turn the dance, so compelling, but also incredibly challenging.

It was at a social where I found myself standing in the middle of the dance floor, surrounded by couples in motion, unable to lock onto the 1. That was the moment I decided to approach the music as a data analyst.

This project explores whether measurable audio features can reveal structural patterns within the salsa groove that help infer the position of the “1”.

Initial experiments suggest:

  • Beat-level signals alone do not reliably indicate the downbeat
  • Aggregating signals at the 4-pair level (1–2, 3–4, 5–6, 7–8) reveals a consistent groove envelope
  • No single transition point consistently defines the structure of the groove

Implication:
Rather than detecting the “1” directly, a more promising approach is to infer position within a repeating rhythmic pattern.

2. Problem Statement

In casino (Cuban) salsa music, dancers orient themselves using an internal count:

1 2 3 — 5 6 7

The “1” represents the beginning of the musical phrase and is essential for dance synchronization.

However:

  • Salsa rhythm is polyrhythmic
  • The downbeat is often not emphasized
  • Traditional beat detection does not reliably recover cycle orientation
  • The 1 can shift within a song

The challenge:
Can machine-measurable audio signals reveal structural cues that allow a system to infer the position of the “1”?

3. Dataset

The dataset consists of 10 salsa recordings.

For each track:

  • The true downbeat (“1”) was manually annotated by a dancer
  • Beats were detected using Librosa
  • The beat grid was aligned with the annotated downbeat

Each beat was labeled with:

Label

Meaning

beat_mod8_true

Beat position within the 8-beat cycle

beat_pair_true

Pair grouping within the cycle

Pair structure:

Pair

Beats

pair0

1–2

pair1

3–4

pair2

5–6

pair3

7–8

4. Audio Features Analyzed

The analysis used several common audio descriptors:

Feature

Meaning

onset_strength

Strength of rhythmic attacks

spectral_flux

Rate of spectral change

spectral_centroid

Brightness of the sound

low_energy

Bass frequency energy

energy_contrast

Contrast between frequency bands

rms

Overall signal energy

These signals were analyzed relative to the beat structure of each track.

5. Beat-Level Analysis

The first experiment evaluated whether any feature consistently marks the downbeat.

Method:

df.groupby(“beat_mod8_true”)

Result:

  • Beat-level signals oscillate naturally
  • No signal reliably indicates beat 1

Conclusion:
Individual beats do not provide a reliable cue for the downbeat.

6. Groove Envelope Analysis

Signals were aggregated by beat pair:

df.groupby(“beat_pair_true”)

This revealed a consistent pattern across tracks.

Typical groove envelope:

Pair

Pattern

1–2

Mild lift

3–4

Trough

5–6

Peak

7–8

Release

Interpretation:
The salsa groove appears to follow a 4-pair structural cycle rather than behaving as eight independent beats.

7. Pair Transition Analysis

Energy transitions between beat pairs were examined to determine whether consistent structural shifts occur within the 8-beat cycle.

Transitions tested:

  • 1–2 → 3–4
  • 3–4 → 5–6
  • 5–6 → 7–8
  • 7–8 → 1–2

Result:

  • A noticeable increase in energy often occurs between 3–4 → 5–6
  • However, this transition is not consistent across tracks
  • Different signals frequently identify different transition points

Conclusion:
No single transition reliably defines a structural pivot within the cycle.

8. Multi-Signal Analysis

To improve stability, multiple audio features were analyzed together:

  • energy_contrast
  • onset_strength
  • low_energy
  • spectral_flux

Method:

  • Compute pair-level averages
  • Evaluate transitions between pairs
  • Each feature produces an estimate of a major shift

Result:

  • Individual signals often disagree
  • Combining signals does not consistently resolve disagreement
  • No stable “hinge” point emerges

Conclusion:
The groove is not defined by a single, consistent transition point.

9. Key Insight

Early analysis suggested a possible structural “hinge” within the 8-beat cycle. However, this proved unreliable.

A clearer pattern emerged at the pair level:

  • The cycle is not best understood as eight independent beats
  • It is not defined by a single pivot point
  • It exhibits a repeating 4-pair structure

Conceptually:
The groove behaves as a continuous pattern rather than a discrete event.

10. Reframing the Problem

These findings suggest a shift in approach.

Instead of:

  • Detecting a single event (the downbeat or a hinge)

A more promising strategy is:

  • Inferring position within a repeating rhythmic pattern

This reframes the problem from:

  • Event detection
    to:
  • Cycle orientation

11. Limitations

  • The dataset contains only 10 tracks
  • Salsa arrangements vary significantly
  • Beat detection errors may affect results
  • Feature behavior may vary across styles

Further validation with a larger dataset is required.

12. Future Work

  • Expand dataset to 50+ tracks
  • Refine pair-level and cycle-based representations
  • Track changes in the position of the 1 within songs
  • Evaluate additional rhythmic and spectral features
  • Explore probabilistic and machine learning approaches

13. Conclusion

This exploratory analysis suggests that salsa rhythm is not easily reduced to a single identifiable downbeat or structural pivot.

Instead, the groove follows a repeatable, multi-beat pattern that is more effectively understood at the level of paired beats.

If validated at scale, this approach may offer a more robust method for inferring rhythmic orientation in complex dance music.