you are tasked with improving the performance of a functional unit. the computation for the functional unit has 4 steps (a-d), and each step is indivisible. assume there is no dependency between successive computations. (5pts) what is the greatest possible clock rate speedup possible with pipelining? you do not need to worry about the register timing constraints (e.g., delay, setup, hold). explain your reasoning. (5pts) for maximizing the clock rate, what is the minimum number of pipeline registers you would use? where would you insert the registers (draw or describe) into the datapath provided for this functional unit? why not use fewer or more pipeline stages?