Sounds good - part 1
I've spent a bit of time recently on one particular coding project of mine: Go-Sound, which is a library for sound manipulation and analysis written in Golang. When explaining it to people however, it turns out that not many people know how sounds work - indeed, I didn't until recently - which is interesting as it's something we use all the time (statistically, you're probably hearing something while reading this) but don't really know much about. This post is an introduction to the library, but also hopefully serves as a tutorial about what sound actually is.
Sound = changing pressure
When you hear a sound, what are you actually sensing? Sound waves are actually changes in pressure that are picked up by your eardrum. When you hit a drum, or pluck a string, or blow air through a tube, you are vibrating the air, and these pressure changes travel out in all directions - if they hit an ear, those changes will get interpreted as sounds. Once convenient way to represent sound then is a graph of pressure over time - e.g. go to any song in my soundcloud and this is what the bar chart represents (roughly...), as shown above.
Sampling
How do computers deal with sound? Changes in pressure are digitized by 'sampling' - you measure the pressure very frequently using something like a microphone, and you end up with a stream of values representing the pressure at each time. These samples can then be sent to speakers to be converted back into sounds you can hear with your ear.
In go-sound, the sampling is done 44100 times per second ('44.1kHz' a common frequency) and the pressure is normalized to be in the range -1.0 -> 1.0. A fundamental 'Sound' data type then is something which has a channel, to which it writes the samples over time.
Tone
If atoms are the building blocks of chemistry, what are the atoms of sound? Generally these are thought of as 'tones' - or you might think of them as 'notes' or 'pitches'. A single key on a piano, a particular position on a violin, or the best example: a tuning fork, are all close to what is considered a note. In the changing-pressure-over-time definition, what these are is a sinusoid (e.g. sine wave) that repeats at a certain frequency. If you've ever heard of 'A440', that's saying that the note 'A' is 440hz, or the pressure wave goes through a full cycle 440 times per second.
This is done in go-sound via sounds.NewSineWave(440.0), to create a (infinite) sine wave at a given pitch, which when played, will sound like A440. The code for this cycles a value from 0 to 2*PI over the correct number of samples, and writes out the sin of that value to the channel.
An example tone:
Timing
If you play that in go-sound, you'll notice it never stops, the samples just keep getting written and you can play it as long as you like. This isn't really a problem, but in general you want your sounds to stop. go-sound provides sounds.NewTimedSound(Sound, DurationMS) where any sound can be forced to end after a certain duration, achieved by closing the sample channel after the defined number of samples (using 1ms = 441 samples).
The sound of silence
Silence is pretty simple: all samples are 0. Technically it could be any constant (the important thing is, no change in pressure), but 0 is the simplest. go-sound implements this in sounds.NewSilence, and has a timed version. You can see the sound of silence above - just a flat line.
Playing notes in series
Now that you have some sounds, the simplest thing you can do is play them one after the other. That is, you play the pressure changes of the first sound, then follow that with the pressure changes of the next sound. Conceptually, you are concatenating the sample streams. In the example above, there is a 440Hz sound followed by an 880hz sound, you can seek the point where it changes.
For example:
Side note: Hertz are logarithmic - that is, when comparing tones, you look at the ratio of the frequency (e.g. 880:440 = 2:1), not the difference. In particular, if you double the frequency (440 to 880), the second tone will sound an octave higher - in technical terms the 'interval ratio' of 2:1 is a perfect octave. Similarly, 3:2 is a perfect fifth - 660hz would be a fifth above 440hz, and indeed we find the note E around 659.25hz (~= 440 * 2^(7/12), for those interested).
Different waveforms
Sine waves aren't the only repeating pattern - a few other shapes are common, as seen above. Cycling any of these at 440hz will still 'feel' like A440, but the actual sound will be quite different. All four are available inside simplewaves.go, and can lead to some interesting sounds when switched around.
To hear this effect:
Chords
Ok, so we can play two notes one after the other, but what about at the same time? This is one of the most useful things about sound: If you play two tones simultaneously, you can hear both - compare this to e.g. looking at two pictures at the same time, where they sort of merge into one. Given two streams of pressure changes over time, how do you produce the samples which sound like both at the same time?
It may be surprising, but the answer is just: add the samples! This is what is happening on your eardrum, your ear (/brain) is smart enough to then interpret the combined signal as two different tones. You'll end up with some quite nice looking patterns, but when played through a speaker, it does indeed sound like multiple sounds played at the same time.
One caution: if you're adding two numbers in the range [-1, 1], the result will be in the range [-2, 2], so you need to halve it to bring it back into the required range. go-sound does the adding and scaling all by s.SumSounds(...sounds...), which can combine any number of sounds into one simultaneous sound playing all of them.
To hear this in action:
Amplification
Above we needed to multiply all samples by a set value (0.5) - what does this do to the sound exactly? It changes the amplitude of the tones involved, which unsurprisingly, results in amplification - that is, the loudness changes. Multiplying by 1 is the original sound, anything > 1 is louder, and < 1 is softer (until you reach 0, which is silence!).
go-sound provides MultiplyWithClip, which does the amplification but then also 'clips' the result to [-1, 1]. Note that this clipped version will no longer be the same shape as the original, and so it might sound different - this is what happens when people complain about 'clipping' on music, you can hear it when you turn up volume too high.
The result of different amplitudes is this:
Envelopes
So far all the basic tones we create have a constant amplitiude - unfortunately, this doesn't tend to happen in real-world music. Instead, one good model for what happens is called an ADSR envelope - a note is split into Attack (amplitude rises as you first hit the note), Decay (it drops just after you release), Sustain (amplitude constant while note rings out) and Release (drops to zero as note stops).
go-sound provides NewADSREnvelope to do ADSR in particular, but enveloping is really a more general idea of multiplying samples by a 'shape', so more may come if any particular shapes are considered useful.
This sample follows normal sounds with enveloped versions:
Pitch bending
This doesn't really fit well as an image, but pitch bending is the idea that you can transition between two pitches (tone frequencies) smoothly - like bending the shape of a guitar string, or moving the slide in a trombone. Rather than having a fixed frequency (like in the tone example), NewHzFromChannel accepts a channel of frequencies - each sample is calculated by continuing the sine wave using the current frequency, which can change each sample. This lets you create things like ambulance sirens, or the famous Shepard tones.
Delay
Also not one that can be shown via image, but delay is a simple idea: take a sound, and add it to a delayed version of itself. As it turns out, this can be implemented using the techniques above: Delay(S, D) = SumSounds(S, ConcatSounds(NewSilence(D), S)), and go-sound offers this via AddDelay.
(Note: sound examples for pitch bending and delay are coming, but have triggered a bug in the .wav generation code. Stay tuned, they'll be added back in once fixed!)
Summary
That's it for part 1 of this tutorial - I planned to also cover spectrograms / fft / pitch & time shifting, but this is already quite long. Thanks to freesound.org for the sound hosting, check them out for these sounds (link) or any other freely available samples.
Feel free to start using go-sound for your own coding - of use might be the soundfile package, which lets you convert between Sound objects and .wav files. All files and most images used in this post were generated by the runthrough.go example file.
It'd be interesting to hook it up to some physical devices too - I plugged it into an arduino with a variable resistor, and managed to make my own 'trombone', but I'm sure there's more cool stuff. I also plan to connect it to Jack Audio to see how fast it can process microphone input (imagine: real-time pitch shifted vocals!), but am currently distracted by adding a mashup editing UI that uses the algorithms provided.
Sound = changing pressure
When you hear a sound, what are you actually sensing? Sound waves are actually changes in pressure that are picked up by your eardrum. When you hit a drum, or pluck a string, or blow air through a tube, you are vibrating the air, and these pressure changes travel out in all directions - if they hit an ear, those changes will get interpreted as sounds. Once convenient way to represent sound then is a graph of pressure over time - e.g. go to any song in my soundcloud and this is what the bar chart represents (roughly...), as shown above.
Sampling
How do computers deal with sound? Changes in pressure are digitized by 'sampling' - you measure the pressure very frequently using something like a microphone, and you end up with a stream of values representing the pressure at each time. These samples can then be sent to speakers to be converted back into sounds you can hear with your ear.
In go-sound, the sampling is done 44100 times per second ('44.1kHz' a common frequency) and the pressure is normalized to be in the range -1.0 -> 1.0. A fundamental 'Sound' data type then is something which has a channel, to which it writes the samples over time.
Tone
If atoms are the building blocks of chemistry, what are the atoms of sound? Generally these are thought of as 'tones' - or you might think of them as 'notes' or 'pitches'. A single key on a piano, a particular position on a violin, or the best example: a tuning fork, are all close to what is considered a note. In the changing-pressure-over-time definition, what these are is a sinusoid (e.g. sine wave) that repeats at a certain frequency. If you've ever heard of 'A440', that's saying that the note 'A' is 440hz, or the pressure wave goes through a full cycle 440 times per second.
This is done in go-sound via sounds.NewSineWave(440.0), to create a (infinite) sine wave at a given pitch, which when played, will sound like A440. The code for this cycles a value from 0 to 2*PI over the correct number of samples, and writes out the sin of that value to the channel.
An example tone:
Timing
If you play that in go-sound, you'll notice it never stops, the samples just keep getting written and you can play it as long as you like. This isn't really a problem, but in general you want your sounds to stop. go-sound provides sounds.NewTimedSound(Sound, DurationMS) where any sound can be forced to end after a certain duration, achieved by closing the sample channel after the defined number of samples (using 1ms = 441 samples).
The sound of silence
Silence is pretty simple: all samples are 0. Technically it could be any constant (the important thing is, no change in pressure), but 0 is the simplest. go-sound implements this in sounds.NewSilence, and has a timed version. You can see the sound of silence above - just a flat line.
Playing notes in series
Now that you have some sounds, the simplest thing you can do is play them one after the other. That is, you play the pressure changes of the first sound, then follow that with the pressure changes of the next sound. Conceptually, you are concatenating the sample streams. In the example above, there is a 440Hz sound followed by an 880hz sound, you can seek the point where it changes.
For example:
Side note: Hertz are logarithmic - that is, when comparing tones, you look at the ratio of the frequency (e.g. 880:440 = 2:1), not the difference. In particular, if you double the frequency (440 to 880), the second tone will sound an octave higher - in technical terms the 'interval ratio' of 2:1 is a perfect octave. Similarly, 3:2 is a perfect fifth - 660hz would be a fifth above 440hz, and indeed we find the note E around 659.25hz (~= 440 * 2^(7/12), for those interested).
Different waveforms
Wikipedia: https://en.wikipedia.org/wiki/File:Waveforms.svg |
To hear this effect:
Chords
Ok, so we can play two notes one after the other, but what about at the same time? This is one of the most useful things about sound: If you play two tones simultaneously, you can hear both - compare this to e.g. looking at two pictures at the same time, where they sort of merge into one. Given two streams of pressure changes over time, how do you produce the samples which sound like both at the same time?
It may be surprising, but the answer is just: add the samples! This is what is happening on your eardrum, your ear (/brain) is smart enough to then interpret the combined signal as two different tones. You'll end up with some quite nice looking patterns, but when played through a speaker, it does indeed sound like multiple sounds played at the same time.
One caution: if you're adding two numbers in the range [-1, 1], the result will be in the range [-2, 2], so you need to halve it to bring it back into the required range. go-sound does the adding and scaling all by s.SumSounds(...sounds...), which can combine any number of sounds into one simultaneous sound playing all of them.
To hear this in action:
Amplification
Above we needed to multiply all samples by a set value (0.5) - what does this do to the sound exactly? It changes the amplitude of the tones involved, which unsurprisingly, results in amplification - that is, the loudness changes. Multiplying by 1 is the original sound, anything > 1 is louder, and < 1 is softer (until you reach 0, which is silence!).
go-sound provides MultiplyWithClip, which does the amplification but then also 'clips' the result to [-1, 1]. Note that this clipped version will no longer be the same shape as the original, and so it might sound different - this is what happens when people complain about 'clipping' on music, you can hear it when you turn up volume too high.
The result of different amplitudes is this:
Envelopes
So far all the basic tones we create have a constant amplitiude - unfortunately, this doesn't tend to happen in real-world music. Instead, one good model for what happens is called an ADSR envelope - a note is split into Attack (amplitude rises as you first hit the note), Decay (it drops just after you release), Sustain (amplitude constant while note rings out) and Release (drops to zero as note stops).
go-sound provides NewADSREnvelope to do ADSR in particular, but enveloping is really a more general idea of multiplying samples by a 'shape', so more may come if any particular shapes are considered useful.
This sample follows normal sounds with enveloped versions:
Pitch bending
This doesn't really fit well as an image, but pitch bending is the idea that you can transition between two pitches (tone frequencies) smoothly - like bending the shape of a guitar string, or moving the slide in a trombone. Rather than having a fixed frequency (like in the tone example), NewHzFromChannel accepts a channel of frequencies - each sample is calculated by continuing the sine wave using the current frequency, which can change each sample. This lets you create things like ambulance sirens, or the famous Shepard tones.
Delay
Also not one that can be shown via image, but delay is a simple idea: take a sound, and add it to a delayed version of itself. As it turns out, this can be implemented using the techniques above: Delay(S, D) = SumSounds(S, ConcatSounds(NewSilence(D), S)), and go-sound offers this via AddDelay.
(Note: sound examples for pitch bending and delay are coming, but have triggered a bug in the .wav generation code. Stay tuned, they'll be added back in once fixed!)
Summary
That's it for part 1 of this tutorial - I planned to also cover spectrograms / fft / pitch & time shifting, but this is already quite long. Thanks to freesound.org for the sound hosting, check them out for these sounds (link) or any other freely available samples.
Feel free to start using go-sound for your own coding - of use might be the soundfile package, which lets you convert between Sound objects and .wav files. All files and most images used in this post were generated by the runthrough.go example file.
It'd be interesting to hook it up to some physical devices too - I plugged it into an arduino with a variable resistor, and managed to make my own 'trombone', but I'm sure there's more cool stuff. I also plan to connect it to Jack Audio to see how fast it can process microphone input (imagine: real-time pitch shifted vocals!), but am currently distracted by adding a mashup editing UI that uses the algorithms provided.
I like this.
ReplyDeleteYou might like to think about what a definition of 'music' is - I think this is a slippery philosophical concept.
I admire that you really did your part to explain sounds from a newbie's perspective. It must have challenging. Images are good as well. When are you going to publish the 2nd part? Thanks.
ReplyDeletehttp://instrumentees.com/
Thanks! Unfortunately exam period slowed me down a bit, so it'll be a little while for pt 2. I'm still trying to figure out the content should be - definitely Fourier transform + spectrogram, but then where to go next is undecided, so my guess is closer to xmas/new years.
DeleteGood intro article with very helpful samples. Thank You!
ReplyDelete