Yes, talking about averages sounds like it might lead a prover astray into reasoning about the average size of compressed data. The average that matters here though is how many inputs map to the same output.
Compressing (n+1)-bit input to n-bit output makes the average output correspond to 2^(n+1)/2^n = 2 inputs. By Dijkstra's rephrasing of the pigeonhole principle, the maximum number of inputs colliding on the same output must be at least 2.
Or we could make it more robust and say we're compressing to n-bit-or-smaller output. Then the average output corresponds to 2^(n+1)/(2^(n+1)-1) inputs. That average is still strictly greater than 1, so the maximum number of inputs colliding on the same output must also be strictly greater than 1 (at least 2, since that's the smallest natural which is greater than 1).
Compressing (n+1)-bit input to n-bit output makes the average output correspond to 2^(n+1)/2^n = 2 inputs. By Dijkstra's rephrasing of the pigeonhole principle, the maximum number of inputs colliding on the same output must be at least 2.
Or we could make it more robust and say we're compressing to n-bit-or-smaller output. Then the average output corresponds to 2^(n+1)/(2^(n+1)-1) inputs. That average is still strictly greater than 1, so the maximum number of inputs colliding on the same output must also be strictly greater than 1 (at least 2, since that's the smallest natural which is greater than 1).