      Ŀ
                                                                     
                            OptiVec for C / C++                      
                                                                     
                            Shareware Version 5                      
      

Contents:
     1. Introduction
     2. System requirements
     3. Installation
     4. Running the examples
     5. Documentation
     6. What's New?
     7. Copyright
     8. Registered Version

1. Introduction
---------------
Use vector functions instead of loops - if it matters to you how fast and
how accurate your data are processed!
The largest portion of almost all programs processing numerical data
consists of loops. Replace these loops by the specialized vector
and matrix functions of OptiVec, and exploit all the advantages of
vectorized programming:

- OptiVec was almost entirely written in Assembly language. The result is a
  highly optimized and numerically accurate code, running 2-3 times faster
  than compiled loops.

- fully vectorized forms of all operators and functions of the language
  C/C++ for all integer and floating-point (including complex!) data types

- Additionally many other real and complex math functions, statistics,
  analysis, FFT techniques, graphics - in total more than 3000!

- Simple and logical syntax, for example
  "VF_exp( Y, X, size );"      for Y[i] = exp( X[i] ) with "float" arrays,
  "VD_FFT( Y, X, size, 1 );"   for double-precision forward FFT
  "ME_mulM( MZ, MX, MY, htX, lenX, lenY );"
                               for extended-precision matrix multiplication

2. System requirements
----------------------
This version of OptiVec comes is designed for PC systems,
equipped with a single- or multi-core CPU (at least 486DX / Pentium / Athlon;
for the multi-processor library: AMD64x2, Core2Duo or higher),

Depending on your choice when ordering or downloading, this package
includes the libraries to be used either with
- the Borland / CodeGear  series of C/C++ compilers
  (Borland C++ 5 or higher,  C++ Builder, Developer Studio, Turbo C++)
- or with the Microsoft Visual C++ compiler series (MSVC 5.0 or higher,
  Visual Studio)


3. Installation
---------------
Please run the installation program, INSTALL.EXE.
INSTALL.EXE moves all OptiVec files into their correct subdirectory
and starts the clock for the trial period.
You may change the default directory structure by hand when the
installation is complete.
To install OptiVec on Windows NT, 2000, XP, Vista etc., you need to log in
with administrator privileges.

After you completed the installation, you must set the include-file search
path according to your OptiVec directory choice:
Say, your OptiVec directory is C:\OPTIVEC. Then, your additional include-file
search path is:
C:\OPTIVEC\INCLUDE      for the include-files.

Add this path to the standard settings in the menu
"Extras / Options / Directories" in MS Visual C++ 5.0,
"Project / (Configuration) Settings / C/C++ / General / Additional
Include Paths"  in Visual Studio,
or
"Options / Project / Directories" of Borland C++ and BC++ Builder
(use a semicolon to separate entries in these fields.)


Add the OptiVec libraries to your project, as described below for the
example files.
For MS Visual C++, you also have to include the Windows API import library:
In the menu   Project / (Configuration) Settings / Linker /
Object and Library modules,
you must add    user32.lib   (if it is not yet there).
Otherwise you would get the linker error
LNK2001: Unresolved external symbol __imp__MessageBoxA@16??

If you wish to use OptiVec for more than one target compiler (for example,
both for Visual C++ and for Borland C++,  or both for Borland C++ and Delphi),
you can install these OptiVec versions all into the same directory, say
C:\OPTIVEC.


4. Running the examples
-----------------------

Check your installation by compiling and running the included demo files.
Follow the instructions in the header of the respective file.
If you get compiler errors like "Cannot open include file ...", or a
linker error "Unresolved external", the library and include paths are
probably not set correctly or the libraries are not properly included
in your project.

Depending on your hardware, you have the choice between two versions of
the OptiVec libraries: Either the general-purpose and debug library,
running on all CPUs, even down to 486DX or AMD K6 / Athlon.
Or the new multi-core optimized version, requiring at least an AMD 64 x2
or a Core2 Duo processor.
See chapter 1.1.2 of HANDBOOK.HTM for details about the multi-core library.

OptiVec for Microsoft Visual C++:
 -  open the project map VDEMO.DSW.
    This is a Visual C++ 6 project. Later versions of Visual Studio will
    convert this project into their own format. When prompted, answer
    "Yes to all" to accept this automatic conversion.

 -  The project map contains four projects:
    VDEMO,  FITDEMO,   CDEMO,  and MANDEL.

    VDEMO   gives a general introduction into OptiVec vector functions,
    FITDEMO demonstrates data fitting with OptiVec, 
    CDEMO   shows how to use CMATH functions for complex-number applications,
    MANDEL  shows how a Mandelbrodt plot can be coded with CMATH

    Compile and run all four of them separately.

 -  If your hardware supports it, you may wish to replace the
    general-purpose debug library  OVVC4D.LIB  with the the multi-core
    optimized library,  OVVC7M.LIB.
    To do that, go to the menu
    Project Settings / Linker / Object/Library modules  (Visual C++ 6) or
    Project Settings / Configuration / Linker / Input / Additional dependencies
         (Visual Studio 2005)


OptiVec for Borland C/C++:
  VDEMOB6.BPR is a project for BC++ Builder 6+ and Borland Dev. Studio
              (Developer Studio will automatically convert the project
              into Dev. Studio format),
  VDEMOB.BPR  is a BC++ Builder 4+ project,

  For BC++ Builder 6+ and Borland Developer Studio,
      open the project VDEMOB6.BPR,
      check that the OptiVec libraries are included with their
      correct paths: VCFS.LIB and either VCF4D.LIB (general-purpose, debug)
      or VCF7M.LIB (multi-core optimized),
      compile and run.

  For BC++ Builder 4+, open the project VDEMOB.BPR, compile and run.

  For the command-line compiler, take VDEMOW.CPP, entering
      BCC32 -W -Iinclude vdemow.cpp lib\vcf4d.lib lib\vcfs.lib
      (or, for multi-core performance, vcf7m.lib instead of vcf4d.lib)
      and run the program by typing:  vdemow


Data-fitting demo for Borland C++:

    For Borland Dev. Studio or BC++ Builder 6+, open the project FITDEMB6.BPR.

    For BC++ Builder 4+, open the project FITDEMOB.BPR.

    For older Borland C++ versions or the command-line compiler, use
    FITDEMOW.CPP.
    Follow the analogous instructions given above for VDEMOW.CPP,
    VDEMOB.BPR and VDEMOB6.BPR.

Complex-number demo for Borland C++:

    For Borland Dev. Studio or BC++ Builder 6+, open the project CDEMOB6.BPR.

    For older Borland C++ versions or the command-line compiler, use
    CDEMO.CPP and follow the instructions given in the header of that file.

Mandelbrodt demo for Borland C++:
    For Borland Dev. Studio or BC++ Builder 6+, open the project MANDELB6.BPR.

    For BC++ Builder 4+, open the project MANDELB.BPR.

    For older Borland C++ versions or the command-line compiler, use
    MANDELW.CPP.
    Follow the analogous instructions given above for VDEMOW.CPP,
    VDEMOB.BPR and VDEMOB6.BPR.


5. Documentation
----------------
The full OptiVec documentation is contained in the files
HANDBOOK.HTM, FUNCREF.HTM, MATRIX.HTM, and CMATH.HTM to be read with
an HTML browser like Firefox, IE, Opera or Netscape.


6. What's New?
--------------
Version 5.2
-  Introduction of the dimension-checking debug library. The single-most
   frequent cause of hard-to-find errors in vector and matrix programs
   are dimension mismatches, leading to read or write operations outside
   the vector / matrix boundaries. The resulting heap corruption usually
   results in errors or crashes far away from the offending operation,
   which makes it often very hard to find the reason of such errors.
   The dimension-checking libraries, marked by the letter "D" in the
   library name, perform a safety check  at the beginning of every vector
   function called, thereby allowing to identify and fix these errors.
   For now, all vector functions and the most important matrix functions
   perform this check; the remaining matrix functions will follow in a
   later release.

-  Bug fix in MCD_read  (failed when the input numbers were written  in
                         brackets or braces)

-  Visual C++ version only: Bug fixes in several VQI_ functions. These bugs
   did not usually affect the results of the VQI_ operations themselves,
   but could lead to program failures later.

Version 5.1.1
-  Stability and accuracy improvement in V?_polyfit

-  Bug fix and stability improvement in the "breakout" part of
   V?_nonlinfit,  M?_nonlinfit,  V?_multiNonlinfit,  M?_multiNonlinfit

Version 5.1:
-  You should call V_initMT at the beginning of any multi-thread applications
   making use of OptiVec functions. Previously, this was necessary only
   if you used the Multi-Core libraries of OptiVec.
   Now, V_initMT also performs initializations which avoid previously
   encountered thread-safety issues inside the memory management functions
   like V?_vector.

-  Calculation of the center of gravity, both for vectors and for matrices:
   V?_centerOfGravityInd,  V?_centerOfGravityV,
   M?_centerOfGravityInd,  M?_centerOfGravityV
   The V?_centerOfGravityInd variant takes the element indices as axis,
   whereas V?_centerOfGravityV takes a specified X axis.

-  P8 libraries (requiring Intel Core2xxx or AMD64xxx processors)
   are included. Any operation involving single-precision complex-number
   multiplication is roughly 50% faster, paying for this increased speed
   by sacrificing 1-2 digits of accuracy.
   Double-precision complex multiplications profit, too, but much less so
   than single-precision.

-  The speed of the Fourier Transform routines was improved once more.
   This latest tweak makes itself felt mostly for smaller vectors and for
   2D FFT.

-  Alternative forms of the functions for element rotations. The new forms
   take the necessary buffer memory as an argument instead of requesting
   it from the operating system each time they are called:
   V?_rotate_buf,
   M?_Rows_rotate_buf,
   M?_Cols_rotate_buf

-  VecObj: integralV is now a member function of the vector Y, whose
   integral over X is being evaluated. X is passed as argument.
   Previously, this function was defined the other way round.

-  Multi-Core Libraries: All arithmetic functions are now also optimized
   for multi-core

-  Bug fixes in
   VE_polyinterpol,  VE_ratinterpol,
   MC?_read (which would sometimes fail if the input numbers were written
             with braces or brackets)

-  Multi-Core Libraries only: Bug fixes in:
   V?_powexp
   V?x_pow2
   V?x_pow10

-  P7 libraries only:
   Bug fix in MD_TmulM  (large MY with uneven lenY caused an error)

-  P7 libraries, used with non-aligned vectors (e.g., static arrays) only:
   Bug fixes in:
   MD_SVsolve, affecting also MD_solveBySVD, MD_safeSolve, VD_linfit,
   and VD_nonlinfit,
   VCD_addC, VCD_subC
   V_UStoUL, V_UStoU,
   MD_TmulM  (large MY with uneven lenY caused an error)

Version 5:
-  The multi-core optimized library is introduced. For now, practically
   all "mathematical" functions, all Fourier-Transform methods, and
   the matrix multiplication functions are fully multi-core optimized.
   The remaining routines will be updated in subsequent versions.

-  Bug fixes in:
   V?_Kepler, V?x_Kepler
   M?sym_eigenvalues

Version 4.4.5:
-  The non-linear data-fitting functions are now thread-safe

-  V?sym_eigenvalues signals failure (by singular or otherwise illegal
   input matrices) by a non-zero return value, instead of ending the
   program.

-  Visual C++ version only: compatibility problem with <crtdbg.h> fixed.
   You can now detect memory leaks by including <crtdbg.h> with the
   _CRTDBG_MAP_ALLOC option on. If you do so, any calls to the VecObj member
   function X.free() must be replaced by the synonymous member function
   X.dealloc();

Version 4.4,  4.4.2, 4.4.3:
-  New organisation and naming of OptiVec Win32 libraries, facilitating
   the migration from earlier versions of the target compilers to the
   more recent ones.

-  Faster matrix decomposition and linear system solver for large matrices

-  bug fixes in VLI_not,  VUL_not (Pentium 4 only),
   MDp_FFTtoC (Pentium III, 4 only)
   VFx_exp (Pentium III, 4 only)
   MDsym_eigenvalues (Visual C++ version only)
   M?_mulMT and M?_TmulM (affected matrices with only one column)

-  additional run-time names for some complex-number math operators
   (which appeared to be missing in some BC++ versions due to naming
   convention changes), so that now all of them should be available
   in all target compiler versions

Version 4.3:
a) bug fix in M?_Row_insert and M?_Row_delete

Version 4.1:
b) Faster multiplication of large matrices:
   M?_mulM, M?_TmulM, M?_mulMT, and M?_TmulM now handle matrices of the order
   1024*1024 up to 10 times as fast, compared to previous versions.
   Small to middle-size matrices are not affected.
c) bug fix in V?_median: rare hanging in an infinite loop is now prevented

Version 4.0:
d) Pentium 4 libraries
e) Mixed floating-point/integer Arithmetic operations like VF_addVI, VD_mulVUS
f) The rounding-to-integer functions now employ "silent" saturation instead of
   generating an error message in the case of overflow.
g) Visual C++ version only: the data type quad is now equivalent to __int64.

Version 3.3.4 / 3.3.5:
h) Bug fix in V?_spectrum (only 32-bit versions affected)
i) Compatibility problem with Visual C++ 2003 fixed

Version 3.3.2 / 3.3.3:
k) Bug fixes in MDp_FFT and MEp_FFT  (MFp_FFT was not affected),
   in the P6 version of VF_rotateCoordinates, and in V_free (there
   was a bug affecting thread safety in multithread applications)
l) New function VF_powexp for the calculation of x^r * exp(x)

Version 3.3:
m) Extension of VecObj for matrices
n) Parallel-enhanced series of Fourier-Transform methods, both one- and
   two-dimensional; 1.5 to 2.5 times as fast as previous FFT routines
o) Bug fixes in the complex accumulation functions and several of the
   mixed-type accumulation functions (VCF_accV, VF_accVLI etc.)
p) accumulation of two vectors at once: VF_acc2V
q) addition / subtraction of two vectors at once: VF_add2V, VF_sub2V
r) coordinate transformation: VF_rotateCoordinates, VCF_rotateCoordinates

Version 3.2.3:
s) Faster matrix inversion
t) MF_block_equM,  MF_block_equMT,  MF_equMblock,  MF_equMblockT etc.:
   functions to extract blocks from matrices or copy them back
   (you need no longer use MF_submatrix and MF_submatrix_equM for
   that purpose).
u) complex random numbers: VCF_random etc.;
   initialization of matrices with random numbers: MF_random etc.

Version 3.2.2:
v) Function MFsym_sqrt
w) Bug fixes in the P6 version of VFu_sqrt, VF_cmp_inclrange0C, ...CC,
   VF_cmp_exclrange0C, ...CC, VCDx_divV, and V?_derivC
x) All nonlinfit functions check if the "best guess" A values lie within
   the specified limits before chi-square is calculated for the first time.
   Thereby, possible failure due to input A values outside the limits is
   avoided.


7. Copyright
------------
The copyright owner of this product as a whole and of all its constituent
parts is
         OptiCode
         Dr. Martin Sander Software Development
         Steinachstr. 9A
         D-69198 Schriesheim
         Germany
         e-mail: optivec@gmx.de

This Shareware version of OptiVec is freely distributable in unchanged form.
For the distribution of applications created using OptiVec, you need the
registered version. The detailed licence terms are described in chapter 1.2
of the file HANDBOOK.HTM.


8. Registered Version
---------------------
If you like OptiVec and decide to use it, please be fair and register.
The registered version

-  has individually optimized libraries for special high-performance
   techniques and for various degrees of processor backward-compatibility:

      "P4" (486DX / Pentium / Athlon +):
          general-purpose, full-acuracy libraries with maximum compatibility
      "P6" (Pentium III+):
          gaining up to 50% higher speed in VF_ functions through the use
          of SSE commands, while sacrificing 2-3 digits of accuracy
      "P7" (Pentium 4+):
          same as "P6", plus about 20% higher speed for VD_ functions
          through the use of SSE2 commands, sacrificing 2-3 digits
      "P8" (AMD64xxx or Intel Core2xxx):
          same as "P7", plus about 40% higher speed for VCF_ functions
          involving complex multiplications through the use of SSE3
          commands;  VCD_ functions typically gain only 0-10%.
      "Multi-core" (AMD64x2+, Core2Duo+, or multi-processor configurations),
          auto-threading libraries, gaining speed by distributing the work-
          load over the available processor cores
      "Large-Vector", gaining some speed through by-passing the cache for
          very large vectors on single-processor machines (this set of
          libraries will probably be dropped from future OptiVec versions,
          when single-processor machines are becoming obsolete)
      "Dimension-consistency debug" (only P4).


-  is available with printed documentation.

-  entitles you to two years of free updates
   (by downloading from our web site)

-  costs USD 249 or EUR 199 for the commercial edition,
         USD  99 or EUR  89 for the educational edition,
   and can be ordered by e-mail from the author or through

   ShareIt:
   OptiVec for Borland / CodeGear C++:
      http://www.shareit.com/programs/101557.htm         (English handbook)
      http://www.shareit.com/deutsch/programs/101556.htm (German handbook)
   OptiVec for MS Visual C++ / Visual Studio:
      http://www.shareit.com/programs/103421.htm

See chapter 1.3 of the file HANDBOOK.HTM for further details about
ordering.


    * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright (C) OptiCode - Dr. Martin Sander Software Dev. 1996-2008
