FREELANCE ']8Č9ލCA5 software divide talkt.:HField1Field2Field3Field4Field5Field6Field7Field8\g ^"} H ~a"Page 1 |DL|DL|DL|DL|DL|DL|DL|DL|DL|DL|DL|DL|DL|DL|DL|DLD@D@DD@DLD@O^DLDHO@DHHH:DLDxG@DOO G:DLD xGDH GH:DLDHtDH DGDHD0000910{name=,r=32}{name=Level2,r=33}{name=Level3,r=34}{name=Level4,r=35}{name=Level5,r=36}o A"#$_f$$`Background Images_p$$`NamedStyle objects@[KHKHO O O <>0000910{name=,r=27}{name=Level2,r=28}{name=Level3,r=29}{name=Level4,r=30}{name=Level5,r=31}o  Label text@V] [$ [$ ]  <>0000864{name=,r=2}{name=Level2,r=3}{name=Level3,r=4}{name=Level4,r=5}{name=Level5,r=6}o Presentation subtitle@X]1$1$]7<>0000886{name=,r=7}{name=Level2,r=8}{name=Level3,r=9}{name=Level4,r=10}{name=Level5,r=11}o Presentation title@"[2$$-2-2-<>0000910{name=,r=22}{name=Level2,r=23}{name=Level3,r=24}{name=Level4,r=25}{name=Level5,r=26}o Numbered list@R"[J'$'$JJ<>0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text@J"[PX$X$PP<>0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title_$%`Presentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title 4"bf$f${b{@BN [J$$JJ<>(c) Copyright IBM Corp. 20050000910{name=,r=45}{name=Level2,r=46}{name=Level3,r=47}{name=Level4,r=48}{name=Level5,r=49}o _$%`Title@'Xw"w" (<>Click here to type presentation title0000886{name=,r=7}{name=Level2,r=8}{name=Level3,r=9}{name=Level4,r=10}{name=Level5,r=11}o Presentation title@V " "7<>Click here to type subtitle0000864{name=,r=2}{name=Level2,r=3}{name=Level3,r=4}{name=Level4,r=5}{name=Level5,r=6}o Presentation subtitlee@VO$O$<>Click here to add clip art0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$` Bulleted ListaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@$""[aX$X${a{a{<>Click here to type bulleted text0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_%$`2-Column BulletsaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[IG G II<>Click here to type bulleted text0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text@"[G$G$<>Click here to type bulleted text0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_%$`1 ChartaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page titlee @S"VJN$N$JJp <>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$`2 ChartsaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page titlee @VJN N JJp <>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o f @VN$N$p <>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$`4 ChartsaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page titlee @VJ  JJ<>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o f @V $ $<>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o g @VJN N  J J+<>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o h @VN$N$  +<>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$`Bullets & ChartaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[a  {a{a{<>Click here to type bulleted text0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted texte @VN$N$p <>Click here to create chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$`Bullets & Clip ArtaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[IS S II<>Click here to type bulleted text0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted texte@VN$N$p <>Click here to add clip art0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$`Organization ChartaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page titlee!@S")VJN$N$JJp <>Click here to create organization chart0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$`TableaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page titlee"@S"VJN$N$JJp <>Click here to create table0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _%$`DiagramaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page titlee&@S"VJN$N$JJp <>Click here to create diagram0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o _$$` Basic LayoutaPresentation Backdrop@ "[$$<>Click here to type page title0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title_d$$`Page 1aTitleke@VO$O$<>Click here to add clip art0000864{name=,r=1}{name=Level2,r=1}{name=Level3,r=1}{name=Level4,r=1}{name=Level5,r=1}o @IW""  7 <>Robert Enenkel, Allan Martin<=IBM<+fs=191>(R)<> Toronto Lab<=<=<=<=<=<=0000875{name=,r=2}{name=Level2,r=3}{name=Level3,r=4}{name=Level4,r=44}{name=Level5,r=6}o Presentation subtitle@*QX z" z"  <+fs=686><=Speeding Up Floating-Point Division With In-lined Iterative Algorithms0000886{name=,r=7}{name=Level2,r=8}{name=Level3,r=9}{name=Level4,r=10}{name=Level5,r=11}o Presentation title_d$$`Page 2a Bulleted Listk@ " [$$<>Outline0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title NAME=Text1;@U"[H$$HH<>Hardware floating-point division<=The case for software division<=Software division algorithms<=Special cases/tradeoffs<=Performance results<=Automatic generation0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 3a Bulleted Listk@ "[$$<>Hardware Division0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[0$$00<>PPC fdiv, fdivs<=Advantages<=<Level2>accurate (correctly rounded)<=handles exceptional cases (Inf, NaN)<=lower latency than SW<=<>Disadvantages<=<Level2>occupies FPU completely<=inhibits parallelism0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 18a Bulleted Listk@ "[$$<>Alternatives to HW division0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[7 $7 $<>Vector libraries<=<Level2>MASS<=higher overhead, greater speedup<=<>In-lined software division<=<Level2>low overhead, medium speedup0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 7a Bulleted Listk@ "![$$7<>Rationale for Software Division0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@ "?[0d$d$00<>Write SW division algorithm in terms of HW arithmetic instructions<=<Level2>Newton's method or Taylor series<=<>Latency will be higher than HW division<=But...SW instructions can be interleaved, so throughput may be better<=Requires enough independent instructions to interleave<=<Level2>loop of divisions<=other work0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 4a Bulleted Listk@ "[$$<>Newton's Method0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@U"[=$=$<>To find x such that f(x) = 0,<=Initial guess x<+fs=266>0<><=x<+fs=266>n+1<> = x<+fs=266>n<> - f(x<+fs=266>n<>)/f'(x<+fs=266>n<>), n=0, 1, 2,...<=Provided x<+fs=265>0<> is close enough<=<Level2>x<Level2+fs=266>n<Level2> converges to x<=It converges quadratically |x<Level2+fs=231>n+1<Level2+fs=400>-x<Level2+fs=402>| << c|x<Level2+fs=231>n<Level2+fs=402>-x|^2<=Number of bits of accuracy doubles with each iteration<Level2>0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 13a Basic Layoutk@ "[$$<>Newton's Method 0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title4"#$n5L!5L!RnR?&"dzC X "O !C ?< dZh ZVL s  E7G)k6z  {d *) z  H   @y}sAxx0le yX99?62d*C   C $P ?&2d ?U 7 _d$$`Page 5a Bulleted Listk@ "&[$$<+fs=661>Newton Iteration for Division0000910{name=,r=37}{name=Level2,r=38}{name=Level3,r=39}{name=Level4,r=40}{name=Level5,r=41}o  Page title@"[$$<>For 1/b, let f(x) = 1/x - b<=For a/b, use a*(1/b) or f(x) = a/x - b<=Algorithm for 1/b<=<Level2>x<Level2+fs=272>0<Level2> ~ 1/b initial guess<=e<Level2+fs=272>0<Level2> = 1 - b*y<Level2+fs=272>0<Level2><=x<Level2+fs=272>1<Level2> = x<Level2+fs=272>0<Level2> + e<Level2+fs=272>0<Level2>*x<Level2+fs=272>0<Level2><=e<Level2+fs=272>1<Level2> = e<Level2+fs=272>0<Level2>*e<Level2+fs=272>0<Level2><=x<Level2+fs=272>2<Level2> = x<Level2+fs=272>1<Level2> + e<Level2+fs=272>1<Level2>*x<Level2+fs=272>1<Level2><=etc...0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 6a Bulleted Listk@ "[$$AA<>How Many Iterations Needed?0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"}[0$$00<>Power5 reciprocal estimate instructions<=<Level2>FRES (single precision), FRE (double prec.)<=|relative error| <<= 2^(-8)<=<>Floating-point precision<=<Level2>single:<|24 bits<=double:<|53 bits<=<>Newton iterations<=<Level2>error: 2^(-16), 2^(-32), 2^(-64), 2^(-128)<=single: <|2 iterations for 1 ulp<=double:<|3 iterations for 1 ulp<=+1 iteration for correct rounding (0.5 ulps)0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 14a Bulleted Listk@ "[L$L$4<>Taylor Series for Reciprocal0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@U"[0E$E$00<>x<+fs=306>0<> ~ 1/b initial guess<=e = 1 - b x<+fs=306>0<><=1/b = x<+fs=306>0<>/(b x<+fs=306>0<>) = x<+fs=306>0<> (1/(1-e)) = x<+fs=306>0<> (1 + e + e^2 + e^3 + e^4 + ...)<=Algorithm (6 terms)<=<Level2>e = 1 - d*x<Level2+fs=272>0<Level2><=t<Level2+fs=272>1<Level2> = 0.5 + e * e<=q<Level2+fs=272>1<Level2> = x<Level2+fs=272>0<Level2> + x<Level2+fs=272>0<Level2> * e<=t<Level2+fs=272>2<Level2> = 0.75 + t<Level2+fs=272>1<Level2>*t<Level2+fs=272>1<Level2><=t<Level2+fs=272>3<Level2> = q<Level2+fs=272>1<Level2>*e<=q<Level2+fs=272>2<Level2> = x<Level2+fs=272>0<Level2> + t<Level2+fs=272>2<Level2>*t<Level2+fs=272>3<Level2> 0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 8a Bulleted Listk@ "[$$<>Speed/Accuracy tradeoff0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[0 $ $00<>IBM compilers have -qstrict/-qnostrict<=-qstrict: SW result should match HW division exactly<=-qnostrict: SW result may be slightly less accurate for speed0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 9a Bulleted Listk@ " [$$<>Exceptions0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@U"6[0Q$Q$00<>Even when a/b is representable...<=1/b may underflow<=<Level2>a ~ b ~ huge, a/b ~ 1, 1/b denormalized<=Causes loss of accuracy<=<>1/b may overflow<=<Level2>a, b denormalized, a/b ~ 1, 1/b = Inf<=Causes SW algorithm to produce NaN<=<>Handle with tests in algorithm<=<Level2>Use HW divide for exceptional cases0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 10a Bulleted Listk@ "[$$<>Algorithm variations0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@1"S[$$<>User callable built-in functions<=<Level2+sc=312>swdiv(a,b)<Level2>: double precision, checking<=swdivs(a,b): single precision, checking<=swdiv_nochk(a,b): double, non-checking<=swdivs_nochk(a,b): single, non-checking<=<>Accuracy of swdiv, swdiv_nochk depends on -qstrict/-qnostrict<=_nochk versions faster but have argument restrictions0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 11aTablek@ "[$$<>Accuracy and Performance0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page titleP$o%%o\\\\\ }{ywusqxohmXkHi8g(ec b_^\ZYWVTR|QlO`NLK@J4I$Ge@!"\[uuu<>0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[EE<>Power5<=speedup ratio0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[]]]<>Power4 <= speedup ratio0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[--<>Power5<=ulps max error0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[E%%EE<>Power4<=ulps max error0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\ [uqquu<>swdivs<=0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[qEqE<>1.070000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[]qq]]<> 1.050000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[q-q-<>0.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[Eq%q%EE<>0.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[uYuYuY<>swdivs_nochk<=0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[EEYYY<>1.460000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[]Y]Y]Y<>1.280000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[--YYY<>0.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[E%%YEYEY<>0.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[u1 1 uu<>swdiv strict<=0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[1 E1 E<>1.050000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[]1 1 ]]<>0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[1 -1 -<>0.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[E1 %1 %EE<>0000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[u   u u <>swdiv nostrict<=0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[ E E   <>1.500000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[]   ] ] <>0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[ - -   <>1.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[E % % E E <>0000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\"[uy uy uy <>swdiv_nochk<= strict<=0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[EEy y y <>1.510000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[]y ]y ]y <>0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[--y y y <>0.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[E%%y Ey Ey <>0000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\$[uuu<>swdiv_nochk<= nostrict<=0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[EE<>1.770000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[]]]<>0000910{name=,r=42}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[--<>1.50000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o @!"\[E%%EE<>0000910{name=,r=43}{name=Level2,r=42}{name=Level3,r=42}{name=Level4,r=42}{name=Level5,r=42}o Q_d$$`Page 12a Bulleted Listk@ "+[$$<>Automatic Generation of Software Division0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[0# $# $00<>The swdivs and swdiv algorithms can also be automatically generated by the compiler<=Compiler can detect situations where throughput is more important than latency0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 15a Bulleted Listk@ "+[$$<>Automatic Generation of Software Division0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@M"[$$<>In straight-line code, we use a heuristic that calculates how much FP can be executed in parallel<=<Level2>independent instructions are good, especially other divides<=dependent instructions are bad (they increase latency)0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 16a Bulleted Listk@ "+[$$<>Automatic Generation of Software Division0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@"[; $; $<>In modulo scheduled loops software-divide code can be pipelined, interleaving multiple iterations<=Divides are expanded if divide does not appear in a recurrence (cyclic data-dependence)0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_d$$`Page 17a Bulleted Listk@ " [$$<>Summary0000910{name=,r=12}{name=Level2,r=13}{name=Level3,r=14}{name=Level4,r=15}{name=Level5,r=16}o  Page title@U"[m$m$<>Software divide algorithms<=<Level2>user callable<=compiler generated<=<>Loops of divides<=<Level2>up to 1.77x speedup<=<>UMT2K benchmark<=<Level2>1.19x speedup0000910{name=,r=17}{name=Level2,r=18}{name=Level3,r=19}{name=Level4,r=20}{name=Level5,r=21}o Bulleted text_