|What is OpenFP||How to use OpenFP||How to download and compile OpenFP||Test results||Papers/Websites on Audio Matching|
To process input audio we use libfftw3 which can read PCM16 audio data. This format can be created using ffmpeg with the parameters "-f u16le -acodec pcm_s16le". This is done automatically by openfp_extract which extracts any input supported by ffmpeg (videos, audo files).
The following graph is a block diagram describing the fingerprint extraction.
The result of the fingerprint extraction is a set of 32bit subfingerprints stored in a single output file representing the original audio file. Each bit in each of the 32bit values describes the spectral power in a specific bark band. Being a bit value it cannot describe any actual value, but only a difference, an edge of a band energy change (as produced by the lowpass filtering).
To speed up matching we calculated MFCC feature vectors that we use to cluster the subfingerprints of all fingerprints. These feature vectors are stored in the fingerprint file every n subfingerprints. Clustering happens during startup of the server process.
Matching is performed by the openfp_server which answers requests sent by openfp_match. Matching is a two step process:
Suitable match positions are found by comparing one or more randomly choosen fingerprints from the audio sample to be checked with all reference fingerprints. A suitable match is found if the hamming distance of two compared fingerprints falls below a certain threshold. For all suitable matches then the average hamming distance for the following fingerprints (corresponding to a few seconds of audio) is calculated. Only when the average hamming distance is also below the matching threshold the fingerprint is considered a real match.
The two-step matching is motivated by performance considerations. The openfp_server implementation also facilitates fingerprint clustering to further eliminate unnecessary comparisons.