There are two things I’m talking about here. One is that I think the warring audio factions might be talking about two very different things (although the FR ppl seem to think there’s only one thing?). The other is which one I think is more important. It’s a wall of words, and in the end I’m not sure if I truly understand it myself so I’m probably gonna get torn to shreds for suggesting it.

I probably should use the word “timing” instead of “time domain”

I think I personally value the timing realm more than the frequency (pitch) realm. The audio engineers are right… you can only discern so much in terms of pitch. It’s 20 - 20,000 and even that’s generous considering 16,000 is already the limit for lots of older listeners. They’re also right that there are psychoacoustic things about sound. BUT I wonder if they forget about the timing when it comes to audio, because from what I can tell all ‘measurements’ when it comes to audio, are related to the Frequency Response (pitch) and not timing. A visual equivalent might be Audio is Color Spectrum and timing is “Frames Per Second”.

Maybe all the in-fighting over the topic is this mis-understanding? On the one side you have the equivalent of FR people focusing on the ‘color reproduction’ saying “You can’t even see Infrared light!” or “If you adjust the color, then the two pictures are exactly the same”. But then team “timing” is talking about resolution and motion fidelity, not necessarily color reproduction.

For example. How do we determine the location of sounds? The difference in timing between when audio reaches the left and right ears. It can be as low as 10 microseconds according to this article:

https://www.sciencefocus.com/science/why-is-there-left-and-right-on-headphones

Another article mentions that humans can detect even less than 10 microseconds (3 - 5 microseconds?) of timing difference:

https://phys.org/news/2013-02-human-fourier-uncertainty-principle.html

So many things can be explained by this. Spatial Cues like staging and imaging. Transients and Textures depend on the speed of changes in frequency, not the frequencies themselves. I think those same things help in determining how detailed & resolving things seem and relate to micro and macro dynamics. It’s known that if you compare a piano note to a guitar note… it’s the brief attack characteristics, the pluck vs the hammer, that clue us into which sound comes from which instrument. I think all of the “life-like” things are mostly in timing dependent vs frequency or pitch.

From what I can tell… the things that make Hi-Fi gear stand out from just the cheapest gear with good EQ applied, are tied to the timing. I’ve been lucky enough to go to a Can Jam before and listened to very expensive things and everything below in terms of price. To my ears, there IS a difference and it didn’t matter what the price tag said, I wasn’t gonna buy the expensive stuff anyways… I just wanted to hear the differences for myself.

I’ve listened to things that “measure perfectly”, like the near perfect Dan Clark Stealth and Dan Clark Expanse. DC uses meta materials to help dampen and “shape” the sound and coincidently measure nearly perfect to the Harman Curve. I’ve listened to many Chi-Fi DACs and AMPs that also measure perfectly (they all use mounds of negative feedback). And to my ears, those are some of the most boring and life-less things to listen to.

** So in my opinion, faithful reproduction of Frequency is NOT the holy grail. You can EQ things anyway you like and I agree that EQ is excellent! It changes the sound more than most things. But good FR performance is cheap in my opinion and that’s great. What’s not widely available are things that perform well in the timing. From what I can tell, that’s what people pay up for.

I’d be interested to see if one day the industry starts creating ways to measure time-domain performance. In my analogy above I use the metaphor of “Frames Per Second”, but timing changes can also be represented in Hz. In the first article, Humans can use timing cues as small as 10 microseconds (μs) which equates to 100,000 Hz in order to position a sound source. In the second article, Humans can detect changes as small as 3 μs. The article mentions 13x to 10x better time difference detection than expected so if 3 μs is on the extreme 13x side that means the other participants were closer to 4 μs or the 10x figure. Going by the 4 μs figure, that would equate to 250,000 Hz resolution. It’s not about pitch, it’s about changes in the audio.

  • meato1@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    You mention timing differences between left and right to contribute to sound localization, which is true, but what does that have to do with headphones?

  • xymordos@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    I’ve always wondered…say, in multi BA iems, what happens if you somehow set the high frequency driver to play perhaps a millisecond slower than the bass driver?

  • Mad_Economist@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Time domain is eminently measurable, and, indeed, measured as a direct consequence of frequency response measurements in essentially every software that does measurements. This is because the default paradigm for measuring headphones is a fourier transform of an impulse response, which gives us both the magnitude and phase values as a function of frequency.

    The video metaphor is quite misleading because our eyes are capable of detecting multiple inputs at once, whereas our ears are pressure detectors - there are no “hearing pixels”, just a set of bandpasses that come after the sum of sound pressure in our ears move the drum. That is, there’s only one variable we’re looking at (the level of displacement at the eardrum at any given time), whereas with our eyes, we have intensity across multiple points.

    The reason that time domain measurements of headphones, amplifiers, so on are not discussed is that there simply is no ‘there’ there - headphones and amplifiers can be accurately approximated as minimum phase systems within their intended operating bandwidth and level, and the only cases where this isn’t true of DACs is when it’s intentional (the use of linear phase filters for reconstruction, for example). This being the case, we can infer the time domain behavior directly from the frequency response behavior.

    “timing” is also an issue where you really need to look at the source material - a bandwidth limited system (like a recording microphone, preamp, and ADC) can only produce a “transient” change at a given speed, which is given by the frequency response of the system. A faster rise time requires, symmetrically, a larger bandwidth (at high frequencies, specifically). This is why you see - or saw - people measuring amplifiers with square waves and other “instantaneous” rise time signals. But if you feed those through the lowpass inherent to your ADC, or for that matter the microphone used for the recording itself, you’ll find that your transient is slowed, because those systems have a high frequency cutoff.

    • wagninger@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      Maybe you know more about this or have a source for how this works, but you comment reminded me of something that is a bit of a mystery to me: if the ear is a pressure detector, how does stuff like staging work in headphones, when there are just 2 membranes for output and 2 ears for input?

      I get how it works that you hear sounds more on the left than on the right, that’s just a difference in volume… but precise positioning on something like a virtual stage?

      • josir1994@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        By the assymmetry of your ears. Sound waves get diffracted and scattered differently when they are coming from different directions, front or back, top or bottom, etc. And you learn to distinguish between them by using the same pair of ears.

        • wagninger@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 months ago

          I mean I get that part… but how does it come from different directions when it’s one flat driver, it has mostly a center and a surrounding area - how does a driver reproduce that

      • SupOrSalad@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        There is the timing and volume level between the ears, which has already been commented on. But as well, if you were to listen to a source in real life from different locations, the response at your ear will be different as well.

        Here’s an example of a Kemar, with free field frequency response measurements at different positions. This is just the left ear, and showing how the response changes at different positions and distances from the head. https://imgur.com/a/Lj8Di0R

        So as a source would change its location, not only would the timing and volume change for each ear, but the sound itself changes too for each ear, which our brain can interpret and compare all the information from both responses to pin point the location of a source. It’s really interesting because even though the sound is changing, your brain still hears it as the same sound

        With headphones, it is a little different since the sound localization is coming from “nowhere”. But with binaural recordings or certain mixing, it does seem possible to simulate some of the localization effects in the recording itself.

      • goldfish_memories@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        Also by time difference of when the sound waves reach your ear… it’s not a mystery at all, that’s how we hear things in real life as well

  • Bennedict929@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Headphones and IEMs are minimum phase device. Any differences observed in the Time Domain will be reflected in the Frequency Response

  • ResolveReviews@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    While the best responses to this post have already been given, I just wanted to add one thing. The reason what you see on the graph for a given headphone “doesn’t tell the whole story” is also in part because its being measured in the condition of being on a particular ‘head’ - this is how we should think of measurement rigs.

    Each rig has its own head-related transfer function (HRTF), as do you, and these are likely to be different to some degree. Think of this as like the effect of the head and ears on incoming sound, and that effect for your head and ears is bound to be different from that effect of the measurement head. That’s not to say “we all just hear differently”, since we all typically have heads and ears that are… head and ear shaped, but there are still going to be some differences that can be meaningful.

    So, HRTF variation is one reason, but there is also another one, and that’s the Headphone Transfer Function (HpTF). This is how the behavior of the headphone can change depending on the head that its on. You mention the well-measuring DCA headphones not sounding very good, one likely explanation for this is that the headphone itself is behaving differently when its being worn by you - and with respect to those headphones in particular I’d actually expect this to be the case (it was the same for me).

    It doesn’t mean the graph is wrong, or that categorically the product doesn’t sound like the graph to SOME person. It just means it doesn’t to you, because the condition of that headphone is different. Bottom line, HRTF and HpTF effects explain much of the whole “there’s more than just FR” concept - at least in cases where all else is reasonably equivalent.

  • bookworm6399@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Eh headphones and IEMs are minimal phase so it really shouldn’t matter for the most part. Multi driver setups might potentially be problematic but in recent years companies have become pretty good at using different length sound tubes for each drivers to minimize such an effect. Also phase measurements are done to check out the very thing you’re discussing here so might be worth checking it out.

  • DJGammaRabbit@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    I EQ everything so what metric should I look for to determine whether I’ll like a thing if not a FR graph?

    I have no technical idea why I like my Grados more than cheaper sets. They’re just… clearer.

  • thatcarolguy@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    The frequency response is what is tricking your brain into thinking you are hearing time domain differences.

    Every single time I thought a headphone sounded slow or fast or whatever it could be reversed with EQ.

  • audioen@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    You just don’t read good sites if you think people don’t check the phase response, resonances and like of the drivers and headset cups. ASR routinely produces harmonic distortion at various (very loud, usually) listening levels, and there’s group delay plot that illuminates the time-domain behavior of the system. Group delay is the useful way to look at phase response, as it is derivate of phase with frequency, and is ideally a flat line showing that all frequencies arrive to listener simultaneously.