
-----------------------------------------
DLR CALLAB -- INDEX
-----------------------------------------

 INTRODUCTION
 OPERATION
 OPERATION: INPUT
 OPERATION: DATA PROCESSING
 OPERATION: OUTPUT
 CONTACT
 REFERENCES



-----------------------------------------
DLR CALLAB -- INTRODUCTION
-----------------------------------------


DLR CalLab  estimates both the intrinsic and the extrinsic parameters
of either a single camera or a stereo-camera (i.e. of a constellation
of two or more cameras rigidly attached between themselves).

Intrinsic parameters mainly describe image magnification, the lens and
sensor (e.g. chip) distortions, as  well as the digitization  process.
These are part of the non-linear transformation between coordinates in
the camera frame S_C  and the image memory  frame S_M. The perspective
distortion is however  solely related  to  the extrinsic  pose of  the
camera (absolute extrinsics).

Extrinsic  parameters  describe the rigid-body transformations between
the camera frame S_C and either the world frame S_0 or the Tool Center
Point (TCP) frame S_T.  The former transformation changes at different
instants, whereas the  latter remains constant as  long as the cameras
are rigidly attached to the TCP.

This calibration tool does not  yet cope with cameras non-conform with
the pinhole camera model nor with more general active vision systems.



-----------------------------------------
DLR CALLAB -- OPERATION
-----------------------------------------


Input formats, data processing, and output formats.


-----------------------------------------
DLR CALLAB -- OPERATION: INPUT
-----------------------------------------


DLR CalLab takes  a previous image processing stage  for granted.  The
previous  stage (e.g.   DLR  CalDe) should cope   with  the problem of
detecting the  special points  of the  calibration object  captured in
images.  These points  represent ground-truth information  required by
the    calibration    process  for  the     estimation  of  the camera
parameters.  However,  the  user also  has  the possibility  to  use a
calibration  algorithm that  estimates  the actual  dimensions  of the
calibration pattern in menu Settings->General settings->Object pattern
scales, refer to [Strobl08].  'Detecting' in this case refers not only
to  locate points in  the image memory frame S_M  but also to identify
them      correctly,  in   order    to   assign     them to    correct
object-related-truth  positions.   In general,  the calibration object
may be three-dimensional (3-D).  DLR CalLab is generally programmed in
order  to accept 3-D calibration  objects  (with the  exception of the
computation of the homographies).   Usually, the calibration object is
however 2-D (e.g.  chessboard-like plates), imaged by the cameras from
different   vantage points. 2-D   calibration  objects are easier  and
cheaper to produce both when manufacturing and measuring.  Please note
that the correct operation of DLR CalLab using 3-D calibration objects
has not yet been verified.

Feature-detecting programs  like DLR CalDe  provide a list of detected
points (or  also blobs  centroids)  that relate estimated image  point
coordinates in S_M  (starting from the  very center of the left  upper
corner pixel) with their  actual coordinate counterparts in the object
frame S_P.

DLR CalLab  expects one  file for every  imaging station.   File names
ought to start with a common project name and are  to be numbered from
0    upwards.    They must  end  with      the  extension *.pts,  i.e.
"<alphanumeric#i>.pts".   Examples: shot0.pts,  shot1.pts,  etc.  Each
file contains  point lists for  every  camera within the stereo-camera
rig, and in the  same order.  In  addition, information about the size
of  the   images  (for    computing  purposes) and     the  rigid-body
transformation  world->TCP are included  in the files.  The world->TCP
transformation moves coordinates in the  TCP frame S_T into themselves
in the world frame  S_0 or, put otherwise,  transforms the world frame
S_0 into the TCP frame S_T.  If the user  does not require TCP-related
extrinsic parameters, DLR CalDe will type an arbitrary transformation.
The  transformation  has  to  be  represented in  Euler   ZYX (movable
coordinate frame)  coordinates, first X, Y, and  Z angles  in degrees,
and then X, Y, and Z translations in mm.

In the following an example is  outlined.  This particular *.pts ASCII
file contains information for two cameras (a stereo-camera):


CIP
#World to TCP
     -158.537      5.19935     -54.5185      350.358      50.6219      325.064
#
# LEFT CAMERA
#
# WIDTH  HEIGHT
         720         576
# 
# Number of Points
         190
# GRID Coordinates            PIXEL Coordinates
# x    y    z                  x          y
     -80.1600     -100.050      0.00000      168.362      513.732
     -60.1200     -100.050      0.00000      156.207      441.140
     -40.0800     -100.050      0.00000      147.325      374.570
     -20.0400     -100.050      0.00000      141.238      314.852


[...]


      60.1200      180.090      0.00000      613.784      168.267
      80.1600      180.090      0.00000      595.977      145.182
#
# RIGHT CAMERA
#
# WIDTH  HEIGHT
         720         576
# 
# Number of Points
         205
# GRID Coordinates            PIXEL Coordinates
# x    y    z                  x          y
      40.0800     -140.070      0.00000      21.4794      186.350
      60.1200     -140.070      0.00000      34.0763      156.933
      0.00000     -120.060      0.00000      28.3062      255.521
      20.0400     -120.060      0.00000      42.1205      217.531


[...]


      80.1600      180.090      0.00000      650.859      123.332
      100.200      180.090      0.00000      637.205      99.9516




Caveat: For historical reasons the user must also provide the original
images to DLR CalLab  in PNG format (both  gray and color  PNG formats
are allowed).   This allows for the  refinement of the detected points
set in relation to the visual appearance of the feature.


To recapitulate we detail the sequence of input operations:

1.  File -> Load  images.  Loads  PNG images  from  the cameras to  be
calibrated.   At least three  images  from  each camera are  required.
Again, file  names start  with  a common project  name  and are to  be
numbered   from 0 upwards.  Separated  by  a  point follows the camera
identification.  This may be for stereo  cameras with just two cameras
either 'upper' or 'lower,' or 'left' or 'right,'  but also numbers are
allowed '0,' '1,' '2' etc.  This identification  order must agree with
the  order  within   the  point lists:  top-down   in  the *.pts  file
corresponds to upper-lower, left-right, or 0-1-2-etc.  After that, and
separated by a point, the  user has free  space to differentiate data.
The    files    must   end       with    the      extension     *.png:
"<alphanumeric#i>.(left|right|upper|lower|0|1|2|...).*png".  Examples:
shot0.upper.raw.png, shot0.lower.raw.png,  shot1.upper.raw.png...   or
shot0.left.free.png, shot0.right.free.png, shot1.left.free.png...   or
shot0.0.raw.png, shot0.1.raw.png,   shot0.2.raw.png,  shot1.0.raw.png,
shot1.1.raw.png, shot1.2.raw.png...

2. File -> Load point files.  Loads the abovementioned *.pts files for
every imaging station.



-----------------------------------------
DLR CALLAB -- OPERATION: DATA PROCESSING
-----------------------------------------


In order DLR CalLab to get optimal results, some user participation is
recommended.  This  concerns data  and numerical  optimization quality
control,  as well  as specifications on   e.g. the desired  distortion
model.  We next summarize these tasks in a sequence of operations:


1. Data verification -- preprocessing.

Image  processing stages are  only robust to  a certain  extent and in
general it  is necessary to refine   their results.  A further  way to
verify these  data  bases on  the  combination  of  their ground-truth
information with the parameterized camera model.  We here exploit this
information in order to discard both highly  noisy and even mistakenly
detected   points.    We   first  perspectively   project ground-truth
information and employ  a simple  distortion  model.  In this case  we
recommend radial  distortion in 3rd  and maybe  5th orders polynomial.
When starting 'Run Options->Run Stage 1,' the camera perspective model
is first  estimated by means  of homographies between the image points
and  their   ground-truth  counterparts.     Estimation   results  are
represented with   green crosses.  IDL  refines this  first estimation
numerically by means of the function MPFIT -- an implementation of the
Levenberg-Marquardt    algorithm.    Internal  parameters   (including
relative  camera poses) and absolute  external  poses are optimized in
order to minimize   distances  between detected and  expected  (model)
points in S_M.

Residuals after this optimization process point  to mistaken points or
to regions where points show low accuracy (especially if they are in a
minority).  For example,   distant regions from  tilted vantage points
lead to  redundant,  very noisy data  in  a  vera small  image region.
These again lead to bad homography estimations and eventually to wrong
optimized parameters.  Now the  user should magnify residuals with the
'Zoom Factor,'  and  then circle around  these erroneous  points  with
left-clicked mouse    to select them, and   afterwards  delete them by
pressing 'Del.'  The absolute pixel error histograms on the right-hand
side of the GUI  window may give helpful  advice in this  concern.  In
the absence of systematic errors, camera error histograms are expected
to present Chi-square  distribution.  You may  click in the histograms
to remove any points yielding unnaturally high errors.  Repeat Stage 1
and refine the points  set until  the maximum  pixel error  amounts to
approx.   three  times the pixel  error  standard deviation (image RMS
error) -- not  even systematic errors should be  as big as three times
the standard deviations and thus, these  certainly point at mistakenly
detected points.

This  robustness-related procedure  has not  been automated since  the
decisions depend to a great  extent on the  cameras at hand.  The user
should save these refined lists of points by  means of 'File->Save all
point files.'


2. Get optimal intrinsic parameters. Stage 1.

The refined  set of  points   may still  be considered  systematically
erroneous if  a more sophisticated  camera model  comes into question,
especially  its  lens  distortion  model.    As aforementioned, radial
distortion in 3rd and 5th orders is a simple but at the same time very
powerful   way  of describing    lens  distortion.  More sophisticated
distortion models may also be implemented: DLR CalLab enables the user
to choose  radial distortion of  3rd, 5th, and 7th orders, decentering
distortion 2nd and 4th orders, as well as thinprism distortion 2nd and
4th orders [cf.     Weng92,  Manual of  Photogrammetry   5th  Edition,
Heikkilä97, Willson94].  Nevertheless,   more sophisticated models  do
have drawbacks like considering whether the appropriate reconstruction
tool be at disposal,  their computational costs and, most importantly,
the numerical instability they may  lead to.  This instability depends
on the optimization method used, on the  variety of the points set and
poses,  on the actual cameras, and  on the  level of sophistication of
the  distortion model.  Complex  models very  rarely lead to plausible
parametrizations, whereas simple ones never fail.  Overparametrization
may  cause increased    variances  of  the estimated   parameters  and
consequently  drift, especially in  the case of  few measurements.  In
this concern and in this  application, it is the user  who has to take
the decision.   A  common practice  is  the gradual  expansion of  the
distorsion model.   Another  practive  is to  compare  parameters from
identical cameras.  Problems mostly occur (if any) when releasing both
decentering and thinprism parameters at the  same time, since they are
consimilar, refer to [Wang06].  Decentering distortion only happens in
the  case  of very  cheap  or damaged  cameras.   In general, extended
distortion models are often not worth the trouble.

The user should  also pay attention  to  the termination condition  of
every optimization process.  Particularly, the numerical optimizations
should not reach the   specified maximum number  of  iterations.  Some
authors  say that numerical instability  is  avoided by splitting  the
estimation process in   two separated numerical  optimizations  -- the
second  one   being  responsible    for   the  distortion  parameters,
reiterating both   optimizations.   This procedure  do  not   actually
improve results, but worsens  them and speeds  down, therefore was not
implemented here.

On the flip   side however,   quantization  errors  are said  to    be
comparable   with   any potential    benefits of  more   sophisticated
distortion models.  This statement cannot  hold for different types of
cameras to the same extent.  Same authors [Tsai87, Zhang98] state that
the distortion is especially dominated  by the first radial distortion
term.   This again does  not hold for every camera.   In fact, quite a
few off-the-shelf  cameras improve results up to  10%  when adding 7th
order  radial   distortion   or 2nd order     decentering or thinprism
distortions with respect to the predominant radial distortion model in
3rd and 5th orders.  Sometimes one simply  cannot afford throwing away
any accuracy.

Under  the correct camera model  assumption, the detected image points
are  the  only   data   supposed to  be   erroneous,  contaminated  by
independently   distributed   Gaussian     noise in    their location.
Therefore, the aforementioned   optimization method supplies   maximum
likelihood estimation (unbiased and with minimum variance) of both the
intrinsic and the absolute extrinsic parameters.

After    running Stage  1,   the log   window  will   inform about the
termination   condition  of  the    numerical optimization.  Optimized
intrinsic parameters are shown  in   the system terminal  (only  under
Linux, Solaris, and OS X) and when saving results.  If the user is not
interested in TCP-related extrinsic parameters, this is the end of the
calibration processs.    In this case the  reader   may skip  the next
subsection.


3. Get optimal TCP-related extrinsic parameters. Stage 2.

Some 3D-vision tasks base on external  positioning systems and in this
context  a good  estimation of  the pose  of  the (stereo-)camera with
respect  to  the  positioning system  becomes  crucial.   This  is the
hand-eye calibration problem.  The underlying  idea reads as  follows:
one can find this relationship  by moving  the positioning system  and
observing the resulting motion of  the (stereo-)camera.  Since Stage 1
supplies accurate   (optimal)  absolute extrinsic  parameters  of  the
(stereo-)camera,  both these and the  positioning readings may be used
as a starting point for hand-eye calibration.

Absolute poses of the  (stereo-)camera (object->camera), together with
the positioning readings  (world->TCP), should   only  differ in   two
constant   transformations: object->world   and TCP->camera.      This
condition rarely  holds in reality  for  every imaging station.  Since
the absolute poses of the camera usually are highly accurate (and most
importantly  unbiased) from  the  last stage,  and cameras  and object
remain static with  respecto to  the TCP and   to the world  base  S_0
respectively, the  errors responsible for  these discrepancies largely
stem from   the positioning system.    Examples for such  a system are
active robotic manipulators, passive ones (e.g.  FARO arm), not-linked
ones like infrared tracking systems (e.g.  ARTtrack cameras), etc.  In
fact, these errors use to exceed potential camera modeling errors.

In order to optimally estimate  the TCP->camera (or/and object->world)
transformations by  maximizing likelihood, one  has to  determine both
the  error sources and  their  probability distributions.  On the  one
hand, and  as  mentioned above, the   external positioning errors  are
usually to  be considered main responsible  for any inconsistency.  On
the other hand, the intrinsic nature of these world->TCP errors remain
often unidentified     but  may  be    approximated  by   independent,
non-identically  distributed  Gaussian distributions both  in position
and in  orientation.    The  maximum  likelihood  estimation   of  the
TCP->camera transformation is  now  the transformation that  minimizes
both position and orientation errors of the positioning system subject
to a  particular metric on SE(3) --  refer to  [Strobl06].  Basically,
Euclidean distances and angles  are optimally merged by weigthing them
by the position/orientation  accuracy ratio, which is a characteristic
of the positioning system at hand  and is automatically adapted by the
algorithm.

Other  practices minimize absolute  pixel reprojection errors, or some
algebraic distances.  The  latter approach is statistically  wrong and
should  never be taken but maybe  for initializations or time-critical
processes.  The former approach,  minimizing absolute pixel errors, is
only sensible in the case of very high  image processing errors (refer
to  [Strobl09])  and,  if possible,  be    jointly performed with  the
intrinsic    parameters  estimation.      This  approach    minimizing
reprojection errors reaches smaller image RMS errors in the particular
calibration   points  set but   mostly  at  the cost   of  much bigger
world->TCP positioning   errors  than the  former.   This  option (not
coupled with  the intrinsic parameters  estimation)  is still possible
with  in    DLR CalLab (menu   Settings->General   settings-> Minimize
projection errors).

DLR CalLab performs  by  default the Nelder-Mead Simplex  optimization
but it also offers the possibility to use other numerical optimization
methods.   Again, the  user should  pay attention  to the  termination
condition of  the  optimization process, especially  in order  not  to
reach the maximum number of iterations.


4. Estimate positioning system accuracy. Stage 3.

In   accordance with  the   abovementioned hand-eye  calibration,  the
world->TCP   transformation  of the    positioning   system   presents
independent, non-identically  distributed Gaussian  errors -- the only
premise  being the  position/orientation accuracy ratio.   As from now
and  as a consequence of   Stage 2, we are  not  only in a position to
estimate the actual magnitude of these transformation errors, but also
to represent them in  image projections considering both intrinsic and
extrinsic    parameters   along   with     the  corrected   world->TCP
transformations.   This, again is a measure  of the correctness of the
ratio used in the last Stage 2.


--------------------------------
DLR CALLAB -- OPERATION: OUTPUT
--------------------------------


It is possible to save the calibration results in an ASCII file in
three different formats:


-------------------------------------------------
DLR CALLAB -- OPERATION: OUTPUT -- MATLAB FORMAT
-------------------------------------------------



This .m file  may be executed by  Matlab.   It contains the  different
camera  parameters for  one or  two cameras  (if two, rigidly attached
between themselves)    in  the same   way    as  the Bouguet   toolbox
does. Please note that the skew parameter does not correspond with the
skew parameter  in the following formats  since here  it is divided by
scaleX.   Similarly,  kc_  is  reordered  according  to  his  toolbox.
In addition,  the  extrinsic camera parameters  (poses)  are included.
You may want to refer to:

http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html


-------------------------------------------------------
DLR CALLAB -- OPERATION: OUTPUT -- SENSOR-MODEL FORMAT
-------------------------------------------------------


SensorModel is a C++ class maintained at the Institute of Robotics and
Mechatronics of the German Aerospace  Center (DLR).  This OOP approach
is a general framework for different sensors attached  to a robot. All
parameters here are the same as in the previous format, except for the
skew parameter.


---------------------------------------------
DLR CALLAB -- OPERATION: OUTPUT -- HH FORMAT
---------------------------------------------


This format is  also supported by  SensorModel and stems from the work
of  Dr Heiko Hirschmueller. In this  format every estimated distortion
parameter is saved -- they are regarded as in [Weng92].  Following, an
example of a *.par file is presented:


# Stereo camera calibration parameters -- DLR CalLab
# Created: 21-Sep-2005 14:51. Pictures taken from folder 050715.
left.A=[582.967047 1.085273 351.518722; 0.000000 637.142179 270.907065; 0.000000 0.000000 1.000000]
left.k2=0.144080
left.k1=-0.380399
R=[0.934101 -0.021456 0.356364; 0.021107 0.999765 0.004870; -0.356385 0.002973 0.934335]
T=[112.851588 -1.714460 31.111203]
right.A=[581.217191 1.590155 361.527961; 0.000000 635.306281 265.353278; 0.000000 0.000000 1.000000]
right.k2=0.139243
right.k1=-0.374112
video.width=720
video.height=576

Rc=[-0.314205 -0.187671 0.930621; 0.908131 -0.345145 0.237009; 0.276719 0.919595 0.278876]
Tc=[10.214825 -65.220227 -28.727763]



The parameters have to be understood as follows:

left.A is the camera intrinsic matrix of the left/upper/0 camera
left.k2 is the 5th order radial distortion parameter for the same camera
left.k1 is the 3rd order radial distortion parameter for the same camera.
right.* are the same parameters for the right/lower/1 camera.

T is the translation of camleft->camright. 
R is the rotation matrix of camright->camleft.

video.width  is the width  of the images (for computational purposes).
video.height is the height of the images (for computational purposes).

Rc is the rotation matrix of camleft->TCP (if any).
Tc is the translation of camleft->TCP (if any).

(A->B moves the A reference frame to the B reference frame and, at the
same time, transforms points in  the B reference frame into themselves
in the A reference frame)


--------------------------------------------------------
DLR CALLAB -- OPERATION: OUTPUT -- IMPORTANT CONVENTION
--------------------------------------------------------


Pixel coordinates are  defined  such that [0,0] is  the  center of the
upper left pixel of  the image. As a  result, [-0.5,-0.5] is the upper
left corner    of    the upper   left   pixel  of      the image,  and
[nu-1+0.5,nv-1+0.5]  the coordinate of the  bottom right corner of the
bottom left pixel.  nu and nv are the number of pixels of the image in
u   and v direction.   This convention  mainly  concerns the principal
point estimation.


--------------------------------
DLR CALLAB -- CONTACT
--------------------------------


Dipl.-Ing. Klaus Strobl  +49-8153 28 2482
Dipl.-Inf. Wolfgang Sepp


--------------------------------
DLR CALLAB -- REFERENCES
--------------------------------


[Tsai87]  Roger Y. Tsai.  A versatile Camera Calibration Technique for
High-Accuracy  3D  Machine  Vision   Metrology Using  Off-the-Shelf TV
Cameras  and  Lenses.    IEEE  Journal  of   Robotics  and Automation,
Vol. RA-3, No. 4, pages 323-344, August 1987.

[Weng92] J.  Weng, P. Cohen, and M.   Herniou. Camera calibration with
distortion   models and   accuracy  evaluation.   IEEE  PAMI,  14(10):
965-980, 1992.

[Willson94] Reg G. Willson and Steven A. Shafer. What is the Center of
the Image?   Journal of the  Optical Society  of  America A, Vol.  11,
No. 11, pp.2946-2955, November 1994.

[Heikkila97] J. Heikkila and O. Silven. A Four-step Camera Calibration
Procedure with   Implicit  Image Correction.  IEEE   Computer  Society
Conference on Computer  Vision and Pattern  Recognition (CVPR'97), San
Juan, Puerto Rico, p. 1106-1112, 1997.

[Zhang98] Z.  Zhang. A flexible  new technique for camera calibration.
Technical Report MSR-TR-98-71, Microsoft Research, December 1998.

[Sturm99] Peter F. Sturm and Steve J. Maybank.  On  Plane-Based Camera
Calibration:  A  General   Algorithm,   Singularities,   Applications.
Proceedings of the  IEEE  Conference  on Computer Vision  and  Pattern
Recognition, Fort Collins, USA, pages 432--437, Juni 1999.

[Manual of   Photogrammetry  5th Edition]   Manual of  Photogrammetry.
American Society of Photogrammetry, 5th edition.

[HorneggerTomasi99] J. Hornegger  and C. Tomasi. Representation issues
in the   ML estimation of  camera  motion.   Proceedings of  the  IEEE
International Conference on  Computer Vision, pages 640-647, September
1999.

[Strobl06] K. H. Strobl and G. Hirzinger. Optimal Hand-Eye Calibration.
Proceedings of the  IEEE/RSJ  International  Conference on Intelligent
Robots and Systems (IROS 2006), Beijing, China, pp. 4647-4653, October
2006.

[Wang06] J. Wang,  F. Shi,  J. Zhang,  and  Y. Liu.  A New Calibration
Model  and   Method of Camera  Lens   Distortion.  Proceedings  of the
IEEE/RSJ International   Conference on Intelligent  Robots and Systems
(IROS 2006), Beijing, China, pp. 5713-5718, October 2006.

[Strobl08]  K. H.  Strobl and  G.  Hirzinger. More Accurate Camera and
Hand-Eye    Calibrations  with   Unknown   Grid    Pattern Dimensions.
Proceedings  of the  IEEE   International Conference on  Robotics  and
Automation (ICRA 2008), Pasadena,  California, USA, pp. 1398-1405, May
2008.

[Strobl09] K. H.  Strobl, W. Sepp, and G.  Hirzinger. On  the Issue of
Camera Calibration with Narrow  Angular Field of View.  Proceedings of
the IEEE/RSJ   International   Conference on Intelligent   Robots  and
Systems (IROS 2009), St. Louis, MO, USA, pp. 309-315, October 2009.






























































