Audio programming project

Philosobyte · December 6, 2015

I have three ideas for a region-wide science fair project, both involving interpreting audio data programmatically. I would like opinions as to these projects' usefulness, their novelty (is there already a precedent in industry?) and the difficulty of implementation (I have two months to do this, and under a load of schoolwork to boot). I would also like some links to resources which I can use to learn more about audio data and how audio is processed in programming. I currently have no idea how audio data is stored in a file.

Idea 1. Detecting the difference between a person's spoken voice and that same person's recorded voice being played on a speaker. I imagine this being useful for multiple-factor authentication.

One way I've thought of accomplishing this includes using the typical frequency response characteristics of speakers (abnormal peaks and valleys, reduced bass and high-end response). Of course, if speakers are used which are accurate enough/calibrated without distortion, then this method would not work, but at least that means robbers have to conceal and carry a studio monitor + interface + computer around. Other things I might potentially explore include phase and sampling rate (do human vocal chords produce any sound above 22Khz?).

Idea 2. A much simpler idea: a practice app for singers which listens to the frequency you're singing, calculates the closest musical pitch which corresponds to that frequency, and plays a sine wave of that frequency (+ harmonics?) on your mobile device's speaker or headphones so you can match your pitch to it and train intonation. The mobile device will pick up its own speaker sound and chassis resonance, so I plan on phase-cancelling that in calculations. Perhaps Autotune can be considered an industry precedent for its pitch-snapping calculations, and noise-cancelling with phase has existed pretty much forever, so I'm not sure how "new" this would be considered in the sense of technology.

Idea 3. Sound localization using two omnidirectional microphones. I theorize that sound localization using differences in time and amplitude can only occur in dimensions of the number of microphones - 1. That means 2 microphones = 1 dimension, 3 microphones = 2 dimensions, and 4 microphones = 3 dimensions. I'm limited to two microphones, so I only get one dimension, which isn't very useful.

However, the human ear changes frequency response when sound hits the ear at different spots. Humans compare the sound input to their memories, so for humans that frequency response change can be broad: computers can't do that yet, so I'm thinking we need to use narrow peaks. Can we use 3D-printed materials in certain shapes so that when sound hits them at certain angles, they consistently add frequency response peaks at obscure frequencies, such as 70Hz or 5000 Hz? That way, we could create artificial "ears" which help determine whether sound is coming in front of or behind the microphones.

Any opinions/info is appreciated. Thanks guys.

December 6, 2015

Idea 2 would be tricky because the human voice (and any instrument for that matter) has harmonics that you would need to account for. Also, that isn't really how musicians who must have intonation learn best in my experience. Most are satisfied with a pitch pipe or piano.

Idea 3 sounds the most interesting to me. I would try to 3d print ears and measure the frequency response of a sound coming from different directions. You could compare it to the microphones bare, or with a symmetrical "ear" structure. The tricky part is that the 3d printing plastic probably has different acoustic reflectivity properties than a real ear.

mathijs727 · December 6, 2015

I dont know good resources to learn from except for a coarse in Digital Signal Processing.

But at least look at the Fourier Transform, it is used to determine which frequencies occur in a signal.

Then use a FFT (Fast Fourier Transform) which is an O(N log N) implementation of the Fourier Transform (at least if the signal contains a power of 2 values).

For FFT there are libraries available that you could use.

Azgoth 2 · December 6, 2015

My input, as a Linguist with a great interest in acoustics/phonetics/phonology, and who's played a lot with manipulating sound.

Idea 1 would be a really cool project, but would probably not work in many environments. In order to be able to uniformly differentiate between recorded and human-generated speech, you'd need to have a very, very sensitive/high-fidelity microphone (might cost prohibitive) to distinguish between human voice and any recordings better than "pretty decent". Plus, since a decent speaker and microphone can be used to record and replay sound almost identically to how it actually occurs, it would get extremely tricky from a technological side. Not impossible, mind you, and it would be a really interesting project, but given the constraints you listed, i don't think it's feasible. But, more importantly, using voice as an authentication relies on the notion of a "voiceprint," i.e., that every person's voice has some unique pattern analogous to a fingerprint which is always present and always uniquely identifies each speaker. The problem is that, for a whole host of phonetic and phonological reasons, this just plain is not true. A person's voice can change dramatically depending on everything from the pitch they're speaking in to whether they're sick or not. A lot of these effects are extremely subtle, and people are completely unaware of them both when they're producing them and when they're hearing them. Plus, some of these factors are determined by things like who's in the room with you--people often unconsciously shift their voices to a higher or lower register or change the timing of their phonemes and so on and so forth when other people are present, and how they change depends on who's present. Lastly, you'd have to do a lot of work to isolate the voice signal from background noise, which can be very difficult for a computer, especially in noisy places like coffee shops or the like.

Again, it's a super cool idea, but it almost instantly smashed headfirst into a lot of issues with speech recognition that Linguists and programmers make their careers solving, and the notion of a "voiceprint" is patently false. That said, if you want to dive headfirst into speech recognition problems, you could still use something like this for a multi-factor authentication. You could do something where an e-mail or text or something gets sent to you with a short phrase to speak, which you then record. Then that input could be fed through a speech recognition package, and if it's the right phrase, grant access. Granted, if you have to build the speech recognition code, this would be a bit overkill, but if not, it could still be neat.

So in other words: as you phrase it, idea 1 would be a great, say, PhD project, but not for a two month deadline, but something sort of like it might work.

As for idea 2: much more feasible. You can do some simple digital signal processing on the person's voice in real-time, analyze where the frequencies are, and use that to determine your playback data. Some caveats: the human voice actually uses two distinct frequencies to produce vowels (in linguistics, these are called "formants"; sometimes, though, there are three), plus the harmonics, and a lot of more trained singers can have an additional formant called the "Singer's Formant". I don't know offhand how the formants relate to the perceived note that's being sung, but that could be a neat little bit of research in and of itself. Not being a singer, I can't say how useful this would be, and I think you're right that Autotune kind of already does this. Autotune just alters the input signal rather than generating an output. Still, it could be cool. Maybe you could also have it just display the nearest note, the frequency of the note you're singing, etc etc as a sort of data readout.

And as for idea 3: this is my favorite. Ultimately, the human eardrum only measures pressure, by measuring the displacement of the eardrum (and the resulting displacement of the bone chain, then the movement of fluid in the cochlea) in one dimension. But we can still echolocate in 3d, because of the absolutely insane shape of our ears. Every single part of the ear (except maybe the earlobe, not sure) helps to reflect and amplify various frequencies. This data, along with the phase offset of the sound in each ear and maybe some stuff with the volume, helps us locate things like direction and approximate distance. If you do this, be sure you also print the ear canal, and put a microphone where the eardum would be. The length and dimensions of the ear canal are absolutely critical to how we hear. Maybe not as regards echolocation, but definitely as regards general hearing. E.g., it amplifies sounds in the low thousands of hertz range, which happens to be where all the formant frequencies for vowels are. The only possible concern with this idea is that plastic ears might not have quite the same acoustic properties as fleshy human ears, but I see no reason not to try. Hell, if you feel really ambitious, you could even look at alternate, possibly more effective ear shapes, but that might get into some complicated physics modeling and such.

You could probably 3d print some ears, put the microphone where it needs to be in the ear canal, and look at the different properties incoming sound at various positions around the head via spectrogram, power series, phase offsets, etc. Then you could calibrate some software to automatically locate the direction of the incoming sound.

To summarize:

Idea 1: Very cool idea, but runs headfirst into every single problem with forensic linguistics and speech recognition.
Idea 2: Also cool, pretty similar to what Autotune does, but could still be very interesting and useful for gathering data on singers.
Idea 3: I love this one. A really cool demonstration in how the shape of the ear helps us echolocate incoming sound, and there's a lot of opportunity for doing some really neat experimental science here. Plus, it gets really into the details of acoustics and acoustic analysis.

Philosobyte · December 7, 2015

I dont know good resources to learn from except for a coarse in Digital Signal Processing.

But at least look at the Fourier Transform, it is used to determine which frequencies occur in a signal.

Then use a FFT (Fast Fourier Transform) which is an O(N log N) implementation of the Fourier Transform (at least if the signal contains a power of 2 values).

For FFT there are libraries available that you could use.

What kind of libraries do you suggest? I'm looking to work with Android (Java).

snip

I've noticed that issue with idea 2 you stated when trying to tune my piano with a generic tuning program. In the extreme octaves, the program recognizes a harmonic over the base pitch If I do this idea, I would be banking on the strongest human voice harmonics being in octaves of the same pitch. I'm going to have to disagree about singers, though; I'm not a professional singer and I don't have time to sing at the piano, but I still have to sing well for my musical occupation. Oftentimes when I'm humming while brushing my teeth, driving, etc. I wished that I could be checking my intonation on my phone.

snip

Good info. I agree that idea 1 would have to be extremely complex in order to work reliably, so I will stop considering it. I'll research formants. I noticed in some rough frequency analysis of vocal recordings that there were many peaks (harmonics), but two or three of the largest peaks (formants?) had nearly equal amplitude. I'll look into it further and hope that the peaks are octaves of the same pitch. Because if they're not, implementation of this idea will be much more difficult.

I also like idea 3 because it seems more like new tech than idea 2, which is just an implementation of existing tech, or reinventing the wheel. However, as I mentioned before, I need these artificial ears to be different from real ears. Instead of changing a range of frequencies, I need it to amplify a single frequency so there's a recognizable peak when sound comes from a certain direction. I've been holding up a bunch of half-sphere-shaped materials to my microphones to try to get a single frequency peak, and none of the materials I used - plastic, glass, aluminum foil, plywood - worked. They only changed the frequencies broadly. I think the problem is a half-sphere shape doesn't result in many reflections. I need to use a shape which has standing waves, like a box, sphere, cone, or cylinder, so large numbers of reverberations create a resonant peak, but those shapes would look ridiculous on an ear.

If I can get a non-ridiculous shape with a resonant peak, then I will go with idea 3. Otherwise, I'll have to go with idea 2.

Thanks for the opinions guys. Still open to ideas/resources.

Azgoth 2 · December 7, 2015

Good info. I agree that idea 1 would have to be extremely complex in order to work reliably, so I will stop considering it. I'll research formants. I noticed in some rough frequency analysis of vocal recordings that there were many peaks (harmonics), but two or three of the largest peaks (formants?) had nearly equal amplitude. I'll look into it further and hope that the peaks are octaves of the same pitch. Because if they're not, implementation of this idea will be much more difficult.

I also like idea 3 because it seems more like new tech than idea 2, which is just an implementation of existing tech, or reinventing the wheel. However, as I mentioned before, I need these artificial ears to be different from real ears. Instead of changing a range of frequencies, I need it to amplify a single frequency so there's a recognizable peak when sound comes from a certain direction. I've been holding up a bunch of half-sphere-shaped materials to my microphones to try to get a single frequency peak, and none of the materials I used - plastic, glass, aluminum foil, plywood - worked. They only changed the frequencies broadly. I think the problem is a half-sphere shape doesn't result in many reflections. I need to use a shape which has standing waves, like a box, sphere, cone, or cylinder, so large numbers of reverberations create a resonant peak, but those shapes would look ridiculous on an ear.

If I can get a non-ridiculous shape with a resonant peak, then I will go with idea 3. Otherwise, I'll have to go with idea 2.

Thanks for the opinions guys. Still open to ideas/resources.

Formants are super cool. There are usually two very strong peaks for vowels, plus a much lower one (~125-200HZ) for the mechanical/acoustic resonance of the whole vocal tract, but this last one isn't considered a "formant". It's just "voicing".

As for the ears: the human ear shape actually is great at doing the amplifications that we need. It's actually not just the frequencies and amplification that matter--your brain actually registers the offset in the phase of the sound hitting both ears, which helps you locate left/right. You should be able to use human-shaped ears--I don't know why you couldn't--but you'd have to do calibrations. Get your stuff set up, play a sound from a known location, record it, move the sound source, rinse and repeat. You also probably don't want to use something like a half-sphere, since it's too symmetric. There's not enough difference between different directions to differentiate them. So you need something with a kind of funky shape, so that sound from all different directions has different things happen to it. Do some research into the acoustics of the human outer ear--I still think that would work, but again, you'd have to calibrate it (but that's true of any setup).

It's also important to know that human hearing is logarithmic in both frequency and intensity (volume): we perceive 100 Hz and 1000 Hz as being about as far apart as 1000 Hz and 10,000 Hz. We have very good frequency resolution under about 4,000 Hz, and increasingly poor resolution as you go higher and higher. The ear canal also amplifies a lot of sound in this low-thousands of Hz range, so differences in incoming amplitude are going to be more prominent. But also don't ignore phase--phase is super duper important for determining the leftness and rightness of incoming sound!

Philosobyte · December 8, 2015

Formants are super cool. There are usually two very strong peaks for vowels, plus a much lower one (~125-200HZ) for the mechanical/acoustic resonance of the whole vocal tract, but this last one isn't considered a "formant". It's just "voicing".

As for the ears: the human ear shape actually is great at doing the amplifications that we need. It's actually not just the frequencies and amplification that matter--your brain actually registers the offset in the phase of the sound hitting both ears, which helps you locate left/right. You should be able to use human-shaped ears--I don't know why you couldn't--but you'd have to do calibrations. Get your stuff set up, play a sound from a known location, record it, move the sound source, rinse and repeat. You also probably don't want to use something like a half-sphere, since it's too symmetric. There's not enough difference between different directions to differentiate them. So you need something with a kind of funky shape, so that sound from all different directions has different things happen to it. Do some research into the acoustics of the human outer ear--I still think that would work, but again, you'd have to calibrate it (but that's true of any setup).

It's also important to know that human hearing is logarithmic in both frequency and intensity (volume): we perceive 100 Hz and 1000 Hz as being about as far apart as 1000 Hz and 10,000 Hz. We have very good frequency resolution under about 4,000 Hz, and increasingly poor resolution as you go higher and higher. The ear canal also amplifies a lot of sound in this low-thousands of Hz range, so differences in incoming amplitude are going to be more prominent. But also don't ignore phase--phase is super duper important for determining the leftness and rightness of incoming sound!

Yep, I plan on using phase and amplitude to determine the sound source's position on the x-axis. However, I would also like to be able to determine direction on the y-axis, and that's not possible when using phase and amplitude alone. That's why I plan on using a resonance peak for y > 0 and one for y < 0. So, phase and amplitude takes care of the x-axis, and the peaks take care of the y-axis. and since I'm not using a third microphone I will only be able to determine direction, not distance. I originally wanted to use four microphones and no ears so that I could use purely phase and amplitude without touching frequency response. Unfortunately, that would have ended up costing the school $900, so I had to scale back and stick the two microphones I already own.

I think the reason our ears function so well with only broad frequency changes is that our brains already have impressions of what things sound like in front of them and behind them, and those broad frequency changes are simply compared to our memories. Microphones don't have anything to compare the sound to. How do microphones know with reasonable accuracy whether that 4k to 6k presence they're hearing is created by the ear or just part of the audio source's natural frequency range?

Update on shapes: I'm having no luck with cylinders or cones. It appears that the aluminum foil and paper cylinders I used decreased the frequencies they were supposed to amplify (800Hz for aluminum and 512Hz for paper; and their harmonics). This is quite discouraging.

Azgoth 2 · December 8, 2015

Yep, I plan on using phase and amplitude to determine the sound source's position on the x-axis. However, I would also like to be able to determine direction on the y-axis, and that's not possible when using phase and amplitude alone. That's why I plan on using a resonance peak for y > 0 and one for y < 0. So, phase and amplitude takes care of the x-axis, and the peaks take care of the y-axis. and since I'm not using a third microphone I will only be able to determine direction, not distance. I originally wanted to use four microphones and no ears so that I could use purely phase and amplitude without touching frequency response. Unfortunately, that would have ended up costing the school $900, so I had to scale back and stick the two microphones I already own.

I think the reason our ears function so well with only broad frequency changes is that our brains already have impressions of what things sound like in front of them and behind them, and those broad frequency changes are simply compared to our memories. Microphones don't have anything to compare the sound to. How do microphones know with reasonable accuracy whether that 4k to 6k presence they're hearing is created by the ear or just part of the audio source's natural frequency range?

Update on shapes: I'm having no luck with cylinders or cones. It appears that the aluminum foil and paper cylinders I used decreased the frequencies they were supposed to amplify (800Hz for aluminum and 512Hz for paper; and their harmonics). This is quite discouraging.

There's definitely an effect from the whole "we've been alive for a long time and have learned what things sound like", but as I said, my understanding is that the shape of the ear allows incoming sound from all directions to be differentiated. This is where I really wish I knew more on the specifics of how it does that. As for the narrow frequency range: that's actually a purely mechanical (or physical) phenomenon. Our ear canal amplified a certain range of frequencies (which actually has more of an effect on how loud something has to be for us to perceive it), and our cochlea--which looks kind of like a snail shell, getting narrower as it spirals inwards--has a greater resolution in lower frequencies by purely mechanical means. A low frequency range, say, 100-200Hz, matches the resonance of a much longer portion of the cochlea than a higher range of equal size, e.g. 5000-5100Hz. This essentially means that for a low frequency, since individual Hz values are more spread out, it's much easier to tell what part of the cochlea is being activated. For higher frequencies, since they're crammed so close together, it's harder to pinpoint a specific frequency. Admittedly, while that's just super damn cool to me, it's not necessarily relevant to using a microphone to do directional location. (EDIT: I realized after I hit "post" that this paragraph might sound antagonistic, which was not my intent. The subject of human hearing and speech was just part of what I spent a lot of time studying at university, and I think it's super cool, so I like talking about it a lot. I'm not trying to sound antagonistic or condescending)

Here's a thought: are you just trying to get something that works for determining direction of incoming sound? Because then you don't need to worry about what, specifically, is happening with the frequencies, just that something changes as you move it around one axis. So you might not care that you're getting a dampening effect at 800Hz with aluminum, as long as something else changes as you move the sound source around. It doesn't matter what frequencies get amplified and dampened as you move it around, just that some frequencies do, and that the pattern is unique for each direction. If you're trying to replicate what happens in the human ear as closely as possible, though, you're gong to ultimately need something shaped like the human ear.

I suspect that the material you use is, pardon the pun, somewhat immaterial. The shape is probably a lot more important. Have you tried something really, prominently asymmetrical along all axes? Or using completely different shapes for each side/microphone/"ear"? E.g., one ear has half of an elipse, the other has some weird, jaggedy shape?

Philosobyte · December 8, 2015

There's definitely an effect from the whole "we've been alive for a long time and have learned what things sound like", but as I said, my understanding is that the shape of the ear allows incoming sound from all directions to be differentiated. This is where I really wish I knew more on the specifics of how it does that. As for the narrow frequency range: that's actually a purely mechanical (or physical) phenomenon. Our ear canal amplified a certain range of frequencies (which actually has more of an effect on how loud something has to be for us to perceive it), and our cochlea--which looks kind of like a snail shell, getting narrower as it spirals inwards--has a greater resolution in lower frequencies by purely mechanical means. A low frequency range, say, 100-200Hz, matches the resonance of a much longer portion of the cochlea than a higher range of equal size, e.g. 5000-5100Hz. This essentially means that for a low frequency, since individual Hz values are more spread out, it's much easier to tell what part of the cochlea is being activated. For higher frequencies, since they're crammed so close together, it's harder to pinpoint a specific frequency. Admittedly, while that's just super damn cool to me, it's not necessarily relevant to using a microphone to do directional location. (EDIT: I realized after I hit "post" that this paragraph might sound antagonistic, which was not my intent. The subject of human hearing and speech was just part of what I spent a lot of time studying at university, and I think it's super cool, so I like talking about it a lot. I'm not trying to sound antagonistic or condescending)

Here's a thought: are you just trying to get something that works for determining direction of incoming sound? Because then you don't need to worry about what, specifically, is happening with the frequencies, just that something changes as you move it around one axis. So you might not care that you're getting a dampening effect at 800Hz with aluminum, as long as something else changes as you move the sound source around. It doesn't matter what frequencies get amplified and dampened as you move it around, just that some frequencies do, and that the pattern is unique for each direction. If you're trying to replicate what happens in the human ear as closely as possible, though, you're gong to ultimately need something shaped like the human ear.

I suspect that the material you use is, pardon the pun, somewhat immaterial. The shape is probably a lot more important. Have you tried something really, prominently asymmetrical along all axes? Or using completely different shapes for each side/microphone/"ear"? E.g., one ear has half of an elipse, the other has some weird, jaggedy shape?

Okay, I now see what you're saying about the ear's various shapes having an effect on resolution as well as frequency and phase. But the ear seems so complicated with its nooks and crannies and subtle effects that, for this project at least, I don't think I'll have enough time to research and simulate a realistic human ear in programming. This is another thing more suited for a PhD project You needn't worry about your writing tone because I'm often the same way. And you're being very helpful. If we can deal with SSL's condescending bluntness, we can deal with anything.

Yes, I'm just trying to get something that works for determining direction of incoming sound. Even if the pattern is unique for each direction, though, that doesn't seem to solve how to tell that the pattern is caused by the ear, and not the source's natural FR. The sources are in a static position. For example, say the cylinder I used on my microphone dampened my voice at 800Hz from -18.2 dBfs to -21.2 dBfs while leaving 700Hz and 900Hz untouched at -40 dBfs. If the microphone only heard the version with the cylinder (-21.2 dBfs), how would it know that the cylinder had any effect? The main problem I'm having right now with this idea is every shape I try has only a very small effect on the response which could even be attributed to random variation in the source audio. I agree that the material doesn't change the fundamental effect of the shape, only the intensity of that effect. However, I haven't tried asymmetrical shapes yet because that would, in theory, reduce the amplitude of the frequency change, and I need to be able to obtain a good amplitude change first.

I might just stick with idea 2 because resonance, the main concept I was going to use for idea 3, doesn't appear to be working. And idea 2 is less complex to implement.

Azgoth 2 · December 8, 2015

Okay, I now see what you're saying about the ear's various shapes having an effect on resolution as well as frequency and phase. But the ear seems so complicated with its nooks and crannies and subtle effects that, for this project at least, I don't think I'll have enough time to research and simulate a realistic human ear in programming. This is another thing more suited for a PhD project You needn't worry about your writing tone because I'm often the same way. And you're being very helpful. If we can deal with SSL's condescending bluntness, we can deal with anything.

Yes, I'm just trying to get something that works for determining direction of incoming sound. Even if the pattern is unique for each direction, though, that doesn't seem to solve how to tell that the pattern is caused by the ear, and not the source's natural FR. The sources are in a static position. For example, say the cylinder I used on my microphone dampened my voice at 800Hz from -18.2 dBfs to -21.2 dBfs while leaving 700Hz and 900Hz untouched at -40 dBfs. If the microphone only heard the version with the cylinder (-21.2 dBfs), how would it know that the cylinder had any effect? The main problem I'm having right now with this idea is every shape I try has only a very small effect on the response which could even be attributed to random variation in the source audio. I agree that the material doesn't change the fundamental effect of the shape, only the intensity of that effect. However, I haven't tried asymmetrical shapes yet because that would, in theory, reduce the amplitude of the frequency change, and I need to be able to obtain a good amplitude change first.

I might just stick with idea 2 because resonance, the main concept I was going to use for idea 3, doesn't appear to be working. And idea 2 is less complex to implement.

Yes, a full acoustic study of the human ear would be a much more in-depth project than this. But certainly a lot of fun! It's a beautiful combination of Linguistics, physics, and audiology.

As for the problems with determining directionality: I have a few ideas on things you can try.

You should make sure you have a consistent experimental setup. Same source sound each time, same volume, only varying the direction, minimizing the effects of the room/environment you're recording in. You should be able to accomplish this to reasonable satisfaction by using, say, a speaker as your sound source, set to a constant volume, playing the same sound every time. Something like white noise would be a good start, since white noise is equal intensity at all frequencies, each frequency having its own random phase (you can still look at the phase difference in the whole waveform, though--each individual frequency just has a randomized phase when you do the Fourier decomposition). Make sure it's something like an actual file on your computer or whatever, so it's exactly the same each time--trust me, it makes calibration and looking at your data completely impossible otherwise. This should let you see any resonance effects pretty easily. Then make sure you're not in a location with a lot of echoes, since then you have to worry about the acoustics of the experimental space. Then make sure you're only varying the direction, not amplitude/what the sound is/how much you capture/distance/etc, from the microphone array. For convenience's sake, it's probably easiest to change the yaw of the speakers and the pitch and roll of the microphones between trials, to use some airplane terminology. Then, make sure you do a number of trials at several distinct locations, capture probably a few seconds of sound, then move the speaker and do another one. It'll probably be a lot of data, admittedly.

Also, and this would require another microphone, make sure you're recording the sound at the microphones' location with a third microphone, one that's not in an "ear", so you can have a reference signal. If you're using white noise, or some very specific, very controlled, consistent sound file, this isn't so much of an issue since you already have the reference: the file you're playing.

The more of those paragraphs I typed, the more I realized that just the experimental/calibration part of this is probably going to be a lot of work. Maybe not hard work (until you get to the part where you determine, say, how far left/righ/front/back/up/down the source is from the microphones), but certainly a lot of it. Though I was trying to go into detail, so it might not be as complicated as that might look at first read, but I'm also very used to thinking about experimental design, so what's second nature to me might not be second nature to you (I don't know how much experimental/laboratory stuff you've done, e.g. for classes).

Idea 2 could always still work, though, since it's seems like it might be easier to get right to the "solving the problem" step there rather than spending huge amounts of time on setting up and collecting data and analyzing and tweaking and that whole process. As cool as the ear idea is (especally the more I think about it), after thinking through what I had to write this post, it might be quite time-consuming.

Philosobyte · December 8, 2015

Yes, a full acoustic study of the human ear would be a much more in-depth project than this. But certainly a lot of fun! It's a beautiful combination of Linguistics, physics, and audiology.

As for the problems with determining directionality: I have a few ideas on things you can try.

You should make sure you have a consistent experimental setup. Same source sound each time, same volume, only varying the direction, minimizing the effects of the room/environment you're recording in. You should be able to accomplish this to reasonable satisfaction by using, say, a speaker as your sound source, set to a constant volume, playing the same sound every time. Something like white noise would be a good start, since white noise is equal intensity at all frequencies, each frequency having its own random phase (you can still look at the phase difference in the whole waveform, though--each individual frequency just has a randomized phase when you do the Fourier decomposition). Make sure it's something like an actual file on your computer or whatever, so it's exactly the same each time--trust me, it makes calibration and looking at your data completely impossible otherwise. This should let you see any resonance effects pretty easily. Then make sure you're not in a location with a lot of echoes, since then you have to worry about the acoustics of the experimental space. Then make sure you're only varying the direction, not amplitude/what the sound is/how much you capture/distance/etc, from the microphone array. For convenience's sake, it's probably easiest to change the yaw of the speakers and the pitch and roll of the microphones between trials, to use some airplane terminology. Then, make sure you do a number of trials at several distinct locations, capture probably a few seconds of sound, then move the speaker and do another one. It'll probably be a lot of data, admittedly.

Also, and this would require another microphone, make sure you're recording the sound at the microphones' location with a third microphone, one that's not in an "ear", so you can have a reference signal. If you're using white noise, or some very specific, very controlled, consistent sound file, this isn't so much of an issue since you already have the reference: the file you're playing.

The more of those paragraphs I typed, the more I realized that just the experimental/calibration part of this is probably going to be a lot of work. Maybe not hard work (until you get to the part where you determine, say, how far left/righ/front/back/up/down the source is from the microphones), but certainly a lot of it. Though I was trying to go into detail, so it might not be as complicated as that might look at first read, but I'm also very used to thinking about experimental design, so what's second nature to me might not be second nature to you (I don't know how much experimental/laboratory stuff you've done, e.g. for classes).

Idea 2 could always still work, though, since it's seems like it might be easier to get right to the "solving the problem" step there rather than spending huge amounts of time on setting up and collecting data and analyzing and tweaking and that whole process. As cool as the ear idea is (especally the more I think about it), after thinking through what I had to write this post, it might be quite time-consuming.

Those are good ideas for calibration, and I actually imagine calibration being the least time-consuming part of the work. I think you're missing one of my main worries, though. My main worry isn't that I won't be able to calibrate things, but that that calibration can't be applied to anything in real life because the effect on amplitude isn't large enough to be considered out of the ordinary. While peaks and valleys in frequency response are easily noticeable in flat/rounded sections of frequency response as with pink noise/many percussive sounds, almost all melodic instruments and voices consist of extremely narrow peaks and valleys with large slopes which look like plus or minus 25dB/10Hz on the graph. There's no way we're applying a FR filter to those kinds of things. The slopes and valleys make it so that a lot of the frequencies from the filter don't even exist, or there's a 50dB difference from one spot to the other.

Azgoth 2 · December 8, 2015

Those are good ideas for calibration, and I actually imagine calibration being the least time-consuming part of the work. I think you're missing one of my main worries, though. My main worry isn't that I won't be able to calibrate things, but that that calibration can't be applied to anything in real life because the effect on amplitude isn't large enough to be considered out of the ordinary. While peaks and valleys in frequency response are easily noticeable in flat/rounded sections of frequency response as with pink noise/many percussive sounds, almost all melodic instruments and voices consist of extremely narrow peaks and valleys with large slopes which look like plus or minus 25dB/10Hz on the graph. There's no way we're applying a FR filter to those kinds of things. The slopes and valleys make it so that a lot of the frequencies from the filter don't even exist, or there's a 50dB difference from one spot to the other.

Ah. I was missing that that was a major concern. Thinking about it, it probably is pretty major, since as you said, we as humans have such a store of experience (many many years) of locating sound, with special, highly tuned, millions-of-years evolved tools for doing so. Replicating it artificially is probably very hard without some additional microphone to use for collecting a reference signal, e.g. an omnidirectional mic that outputs to a mono audio channel (so it can pick up sound from any direction about as well as any other).

And now, of course, I'm off thinking of ways to get around that issue. E.g., use a single microphone, but have multiple different apertures that channel sound into it, with wildly different shapes from each direction. But at this point I'm way down the rabbit hole and will be thinking about this for a few days yet, and it might not be any less work to try and re-do the approach.

(Welcome to the wonderful world of experimental design, where nothing works like you wanted it to, and complexity goes as n^n, where n is how complex you expected it to be!)

Philosobyte · December 9, 2015

Ah. I was missing that that was a major concern. Thinking about it, it probably is pretty major, since as you said, we as humans have such a store of experience (many many years) of locating sound, with special, highly tuned, millions-of-years evolved tools for doing so. Replicating it artificially is probably very hard without some additional microphone to use for collecting a reference signal, e.g. an omnidirectional mic that outputs to a mono audio channel (so it can pick up sound from any direction about as well as any other).

And now, of course, I'm off thinking of ways to get around that issue. E.g., use a single microphone, but have multiple different apertures that channel sound into it, with wildly different shapes from each direction. But at this point I'm way down the rabbit hole and will be thinking about this for a few days yet, and it might not be any less work to try and re-do the approach.

(Welcome to the wonderful world of experimental design, where nothing works like you wanted it to, and complexity goes as n^n, where n is how complex you expected it to be!)

I never thought of using a third microphone as a reference point instead of as a part of calculations. Too bad I can't use a third one

Different apertures with a single microphone would probably work if the apertures were large enough to change the frequency dramatically, like horns. However, it would end up being a large, impractical structure.

Thanks for everything about idea 3. It's just a bit too complicated for a high school project, though, and I'll leave it to the future. Right now my attention should be focused on idea 2.

Azgoth 2 · December 9, 2015

I never thought of using a third microphone as a reference point instead of as a part of calculations. Too bad I can't use a third one

Different apertures with a single microphone would probably work if the apertures were large enough to change the frequency dramatically, like horns. However, it would end up being a large, impractical structure.

Thanks for everything about idea 3. It's just a bit too complicated for a high school project, though, and I'll leave it to the future. Right now my attention should be focused on idea 2.

Yeah, making apertures different enough to have measurable differences would probably lead to very large ones. Or at the very least, it would take a long time to make good ones.

And glad to throw ideas around. It's a fun project idea (to think about--very frustrating to actually construct, it would seem). Yeah, number 2 is probably a much better high school project. And it should be simpler, since there are probably fewer parts to it. (Record sound --> fast Fourier transform --> mathematical analysis of spectrum --> play corresponding output sound).

Sign In

Audio programming project

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Future of PC Cooling?

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

Gamers, We’re Eatin’ Good

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI