(Ignore this paragraph
I used the preprocessor symbol KLUDGE to get around a problem I had trying
to use mpz functions from 1.3 and mpn functions from 2.0.  They wanted to
use different versions of rshift.  You'll need to clean this up when you merge
the code.

I no longer use umul_pphmm.  It would be good, however, if something like
what you did for 16 bit multipliers can be done for 8 bit and even maybe 4 bit
multipliers for mpn_submul for sparc and any other risc machine w/o a multiply
operation.  Currently mpn_bmodgcd runs a little slower on the sparc than it did
before, but I think it can be made faster by this fix.

Another prime candidate for speedup is mpn_sub_b_rsh.

It tests clean on sparc and hppa1.1.  I didn't test it on any CISC yet.
