I'm not familiar with the phrase "image response." Is there a formal definition for the phrase "image response"?
 You decimated your original lowpass 'b' coefficients to generate your 'b_down' coefficients. (The 'b_down' coefficients exhibit nonlinear phase in the frequency domain.) You did not use the 'b_down' coefficients in your code so I wonder, "What purpose is served by your 'b_down' coefficients?"
 You convolved your lowpass 'b' coefficients with your highpass 'b_hp' coefficients to produce your 'u' coefficients. Then you decimated the 'u' coefficients to generate your 'u_down' coefficients. Did you notice that your 'u' and 'u_down' coefficients are asymmetrical and will have nonlinear phase in the frequency domain? Is that a problem?
 What does the frequency magnitude response of the 'u_down' coefficients tell us. That is, how do we interpret that frequency magnitude response?
1. The definition I would propose for image response would be the sum of the undesired component levels that fall in-band due to decimation, with respect to the desired response (0 dB), where the undesired signal is just that signal generated by the test fixture. The test signal is equivalent to white noise filtered by the hpf.
So this is a bit different from inputting a single sine wave at some frequency above fs_out/2, which would only be one component of the total possible undesired energy.
2. The purpose of b_down is to compute the in-band decimator response H1 at fs/4. This is the blue line at the top of the plot in Figure 4.
3. I did not notice. Not sure if it matters.
4. If I can claim that the test signal is a valid signal, then u_down is just the decimator output, given that test input, and the fft of u_down is the spectrum of the decimator output. By the way, we get the same magnitude result using u_down= u(3:4:end). We get a slightly different magnitude using u_down= u(2:4:end).
Regarding the non-symmetric b_down and u_down: If I change two lines of code
b_down = 4*b(4:4:end); (instead of (1:4:end) )
u_down = 4*u(2:4:end); (instead of (1:4:end) )
Then they are symmetric. This has a small effect on both the passband response and the image response.