Actions

icon Post
text/html Subscribe
text/html Unsubscribe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

patch: FFT refactored


  • To: VSIPL++ Developers List <vsipl++@xxxxxxxxxxxxxxxx>
  • Subject: patch: FFT refactored
  • From: Stefan Seefeld <stefan@xxxxxxxxxxxxxxxx>
  • Date: Mon, 06 Mar 2006 23:14:15 -0500

Please find attached a patch containing a first step towards a refactored
FFT implementation. This patch factors out different backend into their
respective implementation (and subdirectory, for simpler maintenance).
Once finished, different backends can be enabled via configure at the same
time, and a compile-/runtime-dispatcher will instantiate the appropriate
backend for a given FFT(M) object.

Here is a short list of the new files:

src/vsip/impl/fft.hpp : Contains the new public Fft(m) API.
src/vsip/impl/fft/backend.hpp : Contains the backend interface definition.
src/vsip/impl/fft/factory.hpp : Contains the generic backend factory bits.
src/vsip/impl/fft/util.hpp : Contains some utility templates.
src/vsip/impl/fft/workspace.hpp : Contains the code responsible for temporary buffers.
src/vsip/impl/fftw3/ : Directory containing the fftw3 bridge (eventually).
src/vsip/impl/ipp/ : Directory containing IPP glue code (eventually).
src/vsip/impl/sal/ : Directory containing SAL glue code (eventually).

The SAL binding is complete as far as the fft.cpp and fftm.cpp tests are
concerned (these new bindings directly support split complex transforms).

However, a number of stubs are still empty, or even wrong. To fill / fix
them I would prefer to start by writing more tests to get better coverage
of all the supported parameters (non-square matrixes, notably, as well as
subviews where strides differ from sizes), before moving forward.

This new code is mostly independent of existing files, i.e. it can coexist
and even be tested with minimal changes to the existing sources / build system.

Thanks,
		Stefan
Index: GNUmakefile.in
===================================================================
RCS file: /home/cvs/Repository/vpp/GNUmakefile.in,v
retrieving revision 1.45
diff -u -r1.45 GNUmakefile.in
--- GNUmakefile.in	24 Jan 2006 17:33:15 -0000	1.45
+++ GNUmakefile.in	7 Mar 2006 03:55:19 -0000
@@ -101,6 +101,7 @@
 
 VSIP_IMPL_HAVE_IPP := @VSIP_IMPL_HAVE_IPP@
 VSIP_IMPL_HAVE_SAL := @VSIP_IMPL_HAVE_SAL@
+VSIP_IMPL_SAL_FFT := @VSIP_IMPL_SAL_FFT@
 VSIP_IMPL_HAVE_BLAS := @VSIP_IMPL_HAVE_BLAS@
 VSIP_IMPL_HAVE_LAPACK := @VSIP_IMPL_HAVE_LAPACK@
 
Index: configure.ac
===================================================================
RCS file: /home/cvs/Repository/vpp/configure.ac,v
retrieving revision 1.84
diff -u -r1.84 configure.ac
--- configure.ac	4 Mar 2006 23:03:05 -0000	1.84
+++ configure.ac	7 Mar 2006 03:55:19 -0000
@@ -1614,7 +1614,10 @@
 
 #
 # Done.
-#
+mkdir -p src/vsip/impl/sal
+mkdir -p src/vsip/impl/ipp
+mkdir -p src/vsip/impl/fftw3
+
 AC_OUTPUT
 
 #
Index: src/vsip/GNUmakefile.inc.in
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/GNUmakefile.inc.in,v
retrieving revision 1.14
diff -u -r1.14 GNUmakefile.inc.in
--- src/vsip/GNUmakefile.inc.in	3 Mar 2006 14:30:53 -0000	1.14
+++ src/vsip/GNUmakefile.inc.in	7 Mar 2006 03:55:19 -0000
@@ -16,12 +16,30 @@
 src_vsip_CXXFLAGS := $(src_vsip_CXXINCLUDES)
 
 src_vsip_cxx_sources := $(wildcard $(srcdir)/src/vsip/*.cpp)
+src_vsip_cxx_sources := $(filter-out %fft-float.cpp \
+                                     %fft-double.cpp \
+                                     %fft-ldouble.cpp, \
+                          $(src_vsip_cxx_sources))
 ifdef VSIP_IMPL_HAVE_IPP
 src_vsip_cxx_sources += $(srcdir)/src/vsip/impl/ipp.cpp
 endif
 ifdef VSIP_IMPL_HAVE_SAL
 src_vsip_cxx_sources += $(srcdir)/src/vsip/impl/sal.cpp
 endif
+ifdef VSIP_IMPL_SAL_FFT
+src_vsip_cxx_sources += $(srcdir)/src/vsip/impl/sal/fft.cpp
+endif
+ifdef VSIP_IMPL_IPP_FFT
+src_vsip_cxx_sources += $(srcdir)/src/vsip/impl/ipp/fft.cpp
+endif
+ifdef VSIP_IMPL_FFTW3
+src_vsip_cxx_sources += $(srcdir)/src/vsip/impl/fftw3/fft.cpp
+endif
+ifdef 0
+src_vsip_cxx_sources += $(srcdir)/src/vsip/fft-float.cpp
+src_vsip_cxx_sources += $(srcdir)/src/vsip/fft-double.cpp
+src_vsip_cxx_sources += $(srcdir)/src/vsip/fft-ldouble.cpp
+endif
 src_vsip_cxx_sources += $(srcdir)/src/vsip/impl/simd/vmul.cpp
 src_vsip_cxx_objects := $(patsubst $(srcdir)/%.cpp, %.$(OBJEXT), $(src_vsip_cxx_sources))
 cxx_sources += $(src_vsip_cxx_sources)
Index: src/vsip/signal-window.cpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/signal-window.cpp,v
retrieving revision 1.4
diff -u -r1.4 signal-window.cpp
--- src/vsip/signal-window.cpp	7 Dec 2005 19:22:05 -0000	1.4
+++ src/vsip/signal-window.cpp	7 Mar 2006 03:55:19 -0000
@@ -11,7 +11,7 @@
 ***********************************************************************/
 
 #include <vsip/selgen.hpp>
-#include "impl/signal-fft.hpp"
+#include "impl/fft.hpp"
 #include "impl/signal-freqswap.hpp"
 #include "impl/signal-window.hpp"
 
Index: src/vsip/signal.hpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/signal.hpp,v
retrieving revision 1.14
diff -u -r1.14 signal.hpp
--- src/vsip/signal.hpp	20 Dec 2005 03:01:32 -0000	1.14
+++ src/vsip/signal.hpp	7 Mar 2006 03:55:19 -0000
@@ -15,7 +15,7 @@
 ***********************************************************************/
 
 #include <vsip/impl/signal-types.hpp>
-#include <vsip/impl/signal-fft.hpp>
+#include <vsip/impl/fft.hpp>
 #include <vsip/impl/signal-conv.hpp>
 #include <vsip/impl/signal-corr.hpp>
 #include <vsip/impl/signal-window.hpp>
Index: src/vsip/impl/fft.hpp
===================================================================
RCS file: src/vsip/impl/fft.hpp
diff -N src/vsip/impl/fft.hpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/fft.hpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,284 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/fft.hpp
+    @author  Stefan Seefeld
+    @date    2006-02-20
+    @brief   VSIPL++ Library: Fft & Fftm class definitions.
+*/
+
+#ifndef VSIP_IMPL_FFT_HPP
+#define VSIP_IMPL_FFT_HPP
+
+#include <vsip/support.hpp>
+#include <vsip/impl/config.hpp>
+#include <vsip/impl/signal-types.hpp>
+#include <vsip/impl/type_list.hpp>
+#include <vsip/impl/fft/backend.hpp>
+#include <vsip/impl/fft/factory.hpp>
+#include <vsip/impl/fft/workspace.hpp>
+#ifdef VSIP_IMPL_HAVE_SAL
+#include <vsip/impl/sal/fft.hpp>
+#endif
+#ifdef VSIP_IMPL_HAVE_IPP
+#include <vsip/impl/ipp/fft.hpp>
+#endif
+#if defined(VSIP_IMPL_FFTW3)
+#include <vsip/impl/fftw3/fft.hpp>
+#endif
+
+namespace vsip
+{
+
+const int fft_fwd = -2;
+const int fft_inv = -1;
+
+namespace impl
+{
+namespace fft
+{
+/// The list of evaluators to be tried, in that specific order.
+typedef Make_type_list<
+#ifdef VSIP_IMPL_SAL_FFT
+  Mercury_sal_tag,
+#endif
+#ifdef VSIP_IMPL_IPP_FFT
+  Intel_ipp_tag,
+#endif
+#ifdef VSIP_IMPL_FFTW3
+  Fftw3_tag,
+#endif
+  None_type
+  >::type LibraryTagList;
+
+} // namespace vsip::impl::fft
+
+template <dimension_type D, typename inT, typename outT, int sD,
+	  return_mechanism_type rmT>
+class Fft_base : protected fft::workspace<D, inT, outT, sD < 0 ? 0 : sD>
+{
+public:
+  static dimension_type const dim = D;
+  typedef typename impl::Scalar_of<inT>::type scalar_type;
+
+  Fft_base(Domain<D> const &dom, scalar_type scale)
+    : fft::workspace<D, inT, outT, sD < 0 ? 0 : sD>
+        (impl::fft::io_size<D, inT, outT, sD>::size(dom),
+	 impl::fft::io_size<D, outT, inT, sD>::size(dom)),
+      input_size_(impl::fft::io_size<D, inT, outT, sD>::size(dom)),
+      output_size_(impl::fft::io_size<D, outT, inT, sD>::size(dom)),
+      scale_(scale)
+  {}
+
+  Domain<dim> const& 
+  input_size() const VSIP_NOTHROW 
+  { return this->input_size_;}
+  
+  Domain<dim> const& 
+  output_size() const VSIP_NOTHROW 
+  { return this->output_size_;}
+  
+  scalar_type 
+  scale() const VSIP_NOTHROW 
+  { return this->scale_;}
+  
+  bool 
+  forward() const VSIP_NOTHROW
+  { return sD == -2 || !impl::Is_complex<inT>::value;}
+  
+protected:
+  Domain<dim> input_size_;
+  Domain<dim> output_size_;
+  scalar_type scale_;
+};
+
+} // namespace vsip::impl
+
+template <template <typename, typename> class constView,
+	  typename inT,
+	  typename outT,
+	  int sD = 0,
+	  return_mechanism_type = by_value,
+	  unsigned nT = 0,
+	  alg_hint_type = alg_time>
+class Fft;
+
+template <template <typename, typename> class constView,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  unsigned nT,
+	  alg_hint_type ahT>
+class Fft<constView, inT, outT, sD, by_value, nT, ahT>
+  : public impl::Fft_base<impl::Dim_of_view<constView>::dim,
+			  inT, outT, sD, by_value>
+{
+  typedef 
+  impl::Fft_base<impl::Dim_of_view<constView>::dim, inT, outT, sD, by_value> base;
+  typedef 
+  impl::fft::factory<base::dim, inT, outT, sD, by_value,
+		     impl::fft::LibraryTagList> factory;
+public:
+  static dimension_type const dim = base::dim;
+
+  Fft(Domain<dim> const& dom, typename base::scalar_type scale)
+    VSIP_THROW((std::bad_alloc))
+    : base(dom, scale),
+      backend_(factory::create(dom, scale))
+  {}
+
+  template <typename BlockT>  
+  typename impl::fft::result<outT,BlockT>::view_type
+  operator()(constView<inT,BlockT> const& in)
+    VSIP_THROW((std::bad_alloc))
+  {
+    typedef impl::fft::result<outT,BlockT> traits;
+    typename traits::view_type out(traits::create(this->output_size()));
+    this->by_reference(this->backend_.get(), in.block(), out.block());
+    return out;
+  }
+
+private:
+  std::auto_ptr<impl::fft::backend<dim, inT, outT,
+				   impl::fft::axis<inT, outT, sD>::value,
+				   impl::fft::direction<inT, outT, sD>::value> >
+    backend_;
+};
+
+template <template <typename, typename> class constView,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  unsigned nT,
+	  alg_hint_type ahT>
+class Fft<constView, inT, outT, sD, vsip::by_reference, nT, ahT>
+  : public impl::Fft_base<impl::Dim_of_view<constView>::dim, inT, outT, sD, 
+			  vsip::by_reference>
+{
+  typedef impl::Fft_base<impl::Dim_of_view<constView>::dim, inT, outT, sD, 
+			 vsip::by_reference> base;
+  typedef impl::fft::factory<base::dim, inT, outT, sD, vsip::by_reference,
+			     impl::fft::LibraryTagList> factory;
+public:
+  static dimension_type const dim = base::dim;
+
+  Fft(Domain<dim> const& dom, typename base::scalar_type scale)
+    VSIP_THROW((std::bad_alloc))
+    : base(dom, scale),
+      backend_(factory::create(dom, scale))
+  {}
+
+  template <typename Block0, typename Block1,
+ 	    template <typename, typename> class View>
+  View<outT,Block1>
+  operator()(constView<inT,Block0> const& in, View<outT,Block1> out)
+    VSIP_NOTHROW
+  {
+    this->by_reference(this->backend_.get(), in.block(), out.block());
+    return out;
+  }
+
+  template <typename BlockT, template <typename, typename> class View>
+  View<outT,BlockT>
+  operator()(View<outT,BlockT> inout) VSIP_NOTHROW
+  {
+    this->in_place(this->backend_.get(), inout.block());
+    return inout;
+  }
+
+private:
+  std::auto_ptr<impl::fft::backend<dim, inT, outT, 
+				   impl::fft::axis<inT, outT, sD>::value,
+                                   impl::fft::direction<inT, outT, sD>::value> >
+    backend_;
+};
+
+template <typename inT,
+	  typename outT,
+	  int A = row,
+	  int D = fft_fwd, 
+	  return_mechanism_type = by_value,
+	  unsigned nT = 0,
+	  alg_hint_type = alg_time>
+class Fftm;
+
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  unsigned nT,
+	  alg_hint_type ahT>
+class Fftm<inT, outT, A, D, by_value, nT, ahT>
+  : public impl::Fft_base<2, inT, outT, 1 - A, by_value>
+{
+  // The sD template parameter in 2D Fft is '0' for column-first
+  // and '1' for row-first transformation. As Fftm's Axis parameter
+  // does the inverse, we use '1 - A' here to be able to share the same
+  // workspace base class.
+  typedef impl::Fft_base<2, inT, outT, 1 - A, by_value> base;
+  typedef impl::fftm::factory<inT, outT, A, D, by_value,
+			      impl::fft::LibraryTagList> factory;
+public:
+  Fftm(Domain<2> const& dom, typename base::scalar_type scale)
+    VSIP_THROW((std::bad_alloc))
+    : base(dom, scale),
+      backend_(factory::create(dom, scale))
+  {}
+
+  template <typename BlockT>  
+  typename impl::fft::result<outT,BlockT>::view_type
+  operator()(const_Matrix<inT,BlockT> const& in)
+    VSIP_THROW((std::bad_alloc))
+  {
+    typedef impl::fft::result<outT,BlockT> traits;
+    typename traits::view_type out(traits::create(this->output_size()));
+    this->by_reference(this->backend_.get(), in.block(), out.block());
+    return out;
+  }
+
+private:
+  std::auto_ptr<typename impl::fft::fftm<inT, outT, A == 0, D == -2> > backend_;
+};
+
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  unsigned nT,
+	  alg_hint_type ahT>
+class Fftm<inT, outT, A, D, vsip::by_reference, nT, ahT>
+  : public impl::Fft_base<2, inT, outT, 1 - A, vsip::by_reference>
+{
+  typedef impl::Fft_base<2, inT, outT, 1 - A, vsip::by_reference> base;
+  typedef impl::fftm::factory<inT, outT, A, D, vsip::by_reference,
+			      impl::fft::LibraryTagList> factory;
+public:
+  Fftm(Domain<2> const& dom, typename base::scalar_type scale)
+    VSIP_THROW((std::bad_alloc))
+    : base(dom, scale),
+      backend_(factory::create(dom, scale))
+  {}
+
+  template <typename Block0, typename Block1>
+  Matrix<outT,Block1>
+  operator()(const_Matrix<inT,Block0> const& in, Matrix<outT,Block1> out)
+    VSIP_NOTHROW
+  {
+    this->by_reference(this->backend_.get(), in.block(), out.block());
+    return out;
+  }
+
+  template <typename BlockT>
+  Matrix<outT,BlockT>
+  operator()(Matrix<outT,BlockT> inout) VSIP_NOTHROW
+  {
+    this->in_place(this->backend_.get(), inout.block());
+    return inout;
+  }
+
+private:
+  std::auto_ptr<typename impl::fft::fftm<inT, outT, A == 0, D == -2> > backend_;
+};
+
+} // namespace vsip
+
+#endif // VSIP_IMPL_FFT_HPP
Index: src/vsip/impl/metaprogramming.hpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/impl/metaprogramming.hpp,v
retrieving revision 1.11
diff -u -r1.11 metaprogramming.hpp
--- src/vsip/impl/metaprogramming.hpp	11 Jan 2006 16:22:45 -0000	1.11
+++ src/vsip/impl/metaprogramming.hpp	7 Mar 2006 03:55:20 -0000
@@ -125,6 +125,9 @@
 struct Int_type
 {};
 
+struct false_type { static const bool value = false; };
+struct true_type  { static const bool value = true; };
+
 } // namespace impl
 } // namespace vsip
 
Index: src/vsip/impl/signal-fft.hpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/impl/signal-fft.hpp,v
retrieving revision 1.34
diff -u -r1.34 signal-fft.hpp
--- src/vsip/impl/signal-fft.hpp	7 Mar 2006 02:15:22 -0000	1.34
+++ src/vsip/impl/signal-fft.hpp	7 Mar 2006 03:55:20 -0000
@@ -21,6 +21,7 @@
 #include <vsip/impl/fast-block.hpp>
 #include <vsip/impl/global_map.hpp>
 #include <vsip/impl/profile.hpp>
+#include <vsip/impl/metaprogramming.hpp>
 
 #include <new>
 
@@ -33,9 +34,6 @@
 namespace impl
 {
 
-struct false_type { static const bool value = false; };
-struct true_type  { static const bool value = true; };
-
 //
 // struct Fft_core
 //
Index: src/vsip/impl/fft/backend.hpp
===================================================================
RCS file: src/vsip/impl/fft/backend.hpp
diff -N src/vsip/impl/fft/backend.hpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/fft/backend.hpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,355 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/fft/backend.hpp
+    @author  Stefan Seefeld
+    @date    2006-02-17
+    @brief   VSIPL++ Library: fft backend interfaces.
+
+    This file defines fft backend interfaces to be implemented by
+    third-party library bridges.
+*/
+
+#ifndef VSIP_IMPL_FFT_BACKEND_HPP
+#define VSIP_IMPL_FFT_BACKEND_HPP
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/impl/metaprogramming.hpp>
+
+namespace vsip
+{
+namespace impl
+{
+namespace fft
+{
+
+bool const forward = true;
+bool const inverse = false;
+  
+template <dimension_type D, typename inT, typename outT, 
+	  int Axis, bool Fwd> class backend;
+
+template <dimension_type D, typename T>
+struct backend_base
+{
+  static dimension_type const dim = D;
+  typedef T scalar_type;
+};
+
+/// 1D real forward FFT
+template <typename T, int Axis>
+class backend<1, T, std::complex<T>, Axis, true> : public backend_base<1, T>
+{
+public:
+  virtual ~backend() {}
+//   virtual bool require_copy(in_stride, out_stride);
+  /// real -> complex (interleaved)
+  virtual void by_reference(T *in, stride_type in_stride,
+			    std::complex<T> *out, stride_type out_stride,
+			    length_type length) = 0;
+  /// real -> complex (split)
+  virtual void by_reference(T *in, stride_type in_stride,
+			    std::pair<T *, T *> out, stride_type out_stride,
+			    length_type length) = 0;
+};
+
+/// 1D real inverse FFT
+template <typename T, int Axis>
+class backend<1, std::complex<T>, T, Axis, false> : public backend_base<1, T>
+{
+public:
+  virtual ~backend() {}
+  /// complex (interleaved) -> real
+  virtual void by_reference(std::complex<T> *in, stride_type in_stride,
+			    T *out, stride_type out_stride,
+			    length_type length) = 0;
+  /// real -> complex (split)
+  virtual void by_reference(std::pair<T *, T *> in, stride_type in_stride,
+			    T *out, stride_type out_stride,
+			    length_type length) = 0;
+};
+
+/// 1D complex FFT
+template <typename T, int Axis, bool Fwd>
+class backend<1, std::complex<T>, std::complex<T>, Axis, Fwd>
+  : public backend_base<1, T>
+{
+public:
+  virtual ~backend() {}
+  /// complex (interleaved) in-place
+  virtual void in_place(std::complex<T> *, stride_type, length_type) = 0;
+  /// complex (split) in-place
+  virtual void in_place(std::pair<T *, T *>, stride_type, length_type) = 0;
+  /// complex (interleaved) by-reference
+  virtual void by_reference(std::complex<T> *in, stride_type in_stride,
+			    std::complex<T> *out, stride_type out_stride,
+			    length_type length) = 0;
+  /// complex (split) by-reference
+  virtual void by_reference(std::pair<T *, T *>, stride_type in_stride,
+			    std::pair<T *, T *>, stride_type out_stride,
+			    length_type length) = 0;
+};
+
+/// 2D real forward FFT
+template <typename T, int Axis>
+class backend<2, T, std::complex<T>, Axis, true> : public backend_base<2, T>
+{
+public:
+  virtual ~backend() {}
+  /// real -> complex (interleaved) by-reference
+  virtual void by_reference(T *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+  /// real -> complex (split) by-reference
+  virtual void by_reference(T *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *>,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+};
+
+/// 2D real inverse FFT
+template <typename T, int Axis>
+class backend<2, std::complex<T>, T, Axis, false>
+  : public backend_base<2, T>
+{
+public:
+  virtual ~backend() {}
+  /// complex (interleaved) -> real by-reference
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+  /// complex (split) -> real by-reference
+  virtual void by_reference(std::pair<T *, T *>,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+};
+
+/// 2D complex FFT
+template <typename T, int Axis, bool Fwd>
+class backend<2, std::complex<T>, std::complex<T>, Axis, Fwd>
+  : public backend_base<2, T>
+{
+public:
+  virtual ~backend() {}
+  /// complex (interleaved) in-place
+  virtual void in_place(std::complex<T> *inout,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols) = 0;
+  /// complex (split) in-place
+  virtual void in_place(std::pair<T *, T *>,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols) = 0;
+  /// complex (interleaved) by-reference
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+  /// complex (split) by-reference
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+};
+
+/// 3D real forward FFT
+template <typename T, int Axis>
+class backend<3, T, std::complex<T>, Axis, true>
+  : public backend_base<3, T>
+{
+public:
+  virtual ~backend() {}
+  /// real -> complex (interleaved) by-reference
+  virtual void by_reference(T *in,
+			    stride_type in_x_stride,
+			    stride_type in_y_stride,
+			    stride_type in_z_stride,
+			    std::complex<T> *out,
+			    stride_type out_x_stride,
+			    stride_type out_y_stride,
+			    stride_type out_z_stride,
+			    length_type x_length,
+			    length_type y_length,
+			    length_type z_length) = 0;
+  /// real -> complex (split) by-reference
+  virtual void by_reference(T *in,
+			    stride_type in_x_stride,
+			    stride_type in_y_stride,
+			    stride_type in_z_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_x_stride,
+			    stride_type out_y_stridey,
+			    stride_type out_z_stride,
+			    length_type x_length,
+			    length_type y_length,
+			    length_type z_length) = 0;
+};
+
+/// 3D real inverse FFT
+template <typename T, int Axis>
+class backend<3, std::complex<T>, T, Axis, false> : public backend_base<3, T>
+{
+public:
+  virtual ~backend() {}
+  /// complex (interleaved) -> real by-reference
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_x_stride,
+			    stride_type in_y_stride,
+			    stride_type in_z_stride,
+			    T *out,
+			    stride_type out_x_stride,
+			    stride_type out_y_stride,
+			    stride_type out_z_stride,
+			    length_type x_length,
+			    length_type y_length,
+			    length_type z_length) = 0;
+  /// complex (split) -> real by-reference
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_x_stride,
+			    stride_type in_y_stride,
+			    stride_type in_z_stride,
+			    std::complex<T> *out,
+			    stride_type out_x_stride,
+			    stride_type out_y_stride,
+			    stride_type out_z_stride,
+			    length_type x_length,
+			    length_type y_length,
+			    length_type z_length) = 0;
+};
+
+/// 3D complex FFT
+template <typename T, int Axis, bool Fwd>
+class backend<3, std::complex<T>, std::complex<T>, Axis, Fwd>
+  : public backend_base<3, T>
+{
+public:
+  virtual ~backend() {}
+  /// complex (interleaved) in-place
+  virtual void in_place(std::complex<T> *inout,
+			stride_type x_stride,
+			stride_type y_stride,
+			stride_type z_stride,
+			length_type x_length,
+			length_type y_length,
+			length_type z_length) = 0;
+  /// complex (split) in-place
+  virtual void in_place(std::pair<T *, T *> inout,
+			stride_type x_stride,
+			stride_type y_stride,
+			stride_type z_stride,
+			length_type x_length,
+			length_type y_length,
+			length_type z_length) = 0;
+  /// complex (interleaved) by-reference
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_x_stride,
+			    stride_type in_y_stride,
+			    stride_type in_z_stride,
+			    std::complex<T> *out,
+			    stride_type out_x_stride,
+			    stride_type out_y_stride,
+			    stride_type out_z_stride,
+			    length_type x_length,
+			    length_type y_length,
+			    length_type z_length) = 0;
+  /// complex (split) by-reference
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_x_stride,
+			    stride_type in_y_stride,
+			    stride_type in_z_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_x_stride,
+			    stride_type out_y_stride,
+			    stride_type out_z_stride,
+			    length_type x_length,
+			    length_type y_length,
+			    length_type z_length) = 0;
+};
+
+/// FFTM
+template <typename inT, typename outT, bool Row, bool Fwd> class fftm;
+
+/// real forward FFTM
+template <typename T, bool Row>
+class fftm<T, std::complex<T>, Row, true> : public backend_base<2, T>
+{
+public:
+  virtual ~fftm() {}
+  /// real -> complex (interleaved) by-reference
+  virtual void by_reference(T *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+  /// real -> complex (split) by-reference
+  virtual void by_reference(T *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+};
+
+/// real inverse FFTM
+template <typename T, bool Row>
+class fftm<std::complex<T>, T, Row, false> : public backend_base<2, T>
+{
+public:
+  virtual ~fftm() {}
+  /// complex (interleaved) -> real by-reference
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+  /// complex (split) -> real by-reference
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+};
+
+/// complex FFTM
+template <typename T, bool Row, bool Fwd>
+class fftm<std::complex<T>, std::complex<T>, Row, Fwd> : public backend_base<2, T>
+{
+public:
+  virtual ~fftm() {}
+  /// complex (interleaved) in-place
+  virtual void in_place(std::complex<T> *inout,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols) = 0;
+  /// complex (split) in-place
+  virtual void in_place(std::pair<T *, T *> inout,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols) = 0;
+  /// complex (interleaved) by-reference
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+  /// complex (split) by-reference
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols) = 0;
+};
+
+} // namespace vsip::impl::fft
+} // namespace vsip::impl
+} // namespace vsip
+
+#endif
Index: src/vsip/impl/fft/factory.hpp
===================================================================
RCS file: src/vsip/impl/fft/factory.hpp
diff -N src/vsip/impl/fft/factory.hpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/fft/factory.hpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,218 @@
+/* Copyright (c) 2005 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/fft/factory.hpp
+    @author  Stefan Seefeld
+    @date    2006-02-20
+    @brief   VSIPL++ Library: Fft backend dispatch harness.
+*/
+
+#ifndef VSIP_IMPL_FFT_FACTORY_HPP
+#define VSIP_IMPL_FFT_FACTORY_HPP
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/impl/fft/backend.hpp>
+#include <vsip/impl/fft/util.hpp>
+#include <vsip/impl/metaprogramming.hpp>
+#include <vsip/impl/type_list.hpp>
+#include <memory>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace fft
+{
+
+/// evaluator template.
+/// This needs to be provided for each tag in the LibraryTagList.
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT,
+	  typename Tag>
+struct evaluator
+{
+  static bool const ct_valid = false;
+};
+
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag = typename TagList::first,
+	  typename Rest = typename TagList::rest,
+	  typename Eval = evaluator<Dim, inT, outT, sD, rmT, Tag>,
+	  bool CtValid = Eval::ct_valid>
+struct factory;
+
+/// In case the compile-time check passes, we decide at run-time whether
+/// or not to use this backend.
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag,
+	  typename Rest,
+	  typename Eval>
+struct factory<Dim, inT, outT, sD, rmT,
+	       TagList, Tag, Rest, Eval, true>
+{
+  typedef backend<Dim, inT, outT,
+		  axis<inT, outT, sD>::value,
+		  direction<inT, outT, sD>::value> interface;
+  static std::auto_ptr<interface> 
+  create(vsip::Domain<Dim> const& dom, typename interface::scalar_type scale)
+  {
+    if (Eval::rt_valid(dom)) return Eval::create(dom, scale);
+    else return factory<Dim, inT, outT, sD, rmT, Rest>::create(dom, scale);
+  }
+};
+
+/// In case the compile-time check fails, we continue the search
+/// directly at the next entry in the type list.
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag,
+	  typename Rest,
+	  typename Eval>
+struct factory<Dim, inT, outT, sD, rmT, TagList, Tag, Rest, Eval, false>
+  : factory<Dim, inT, outT, sD, rmT, Rest>
+{};
+
+/// Terminator. Instead of passing on to the next element
+/// it aborts the program. It is a program error to define
+/// callback lists that can't handle a given expression.
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag,
+	  typename Eval>
+struct factory<Dim, inT, outT, sD, rmT, TagList, Tag, None_type, Eval, true>
+{
+  typedef backend<Dim, inT, outT,
+		  axis<inT, outT, sD>::value,
+		  direction<inT, outT, sD>::value> interface;
+  static std::auto_ptr<interface> 
+  create(Domain<Dim> const &dom, typename interface::scalar_type scale)
+  {
+    if (Eval::rt_valid(dom)) return Eval::create(dom, scale);
+    assert(0);
+    return std::auto_ptr<interface>();
+  }
+};
+
+} // namespace vsip::impl::fft
+
+namespace fftm
+{
+
+/// evaluator template.
+/// This needs to be provided for each tag in the LibraryTagList.
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT,
+	  typename Tag>
+struct evaluator
+{
+  static bool const ct_valid = false;
+};
+
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag = typename TagList::first,
+	  typename Rest = typename TagList::rest,
+	  typename Eval = evaluator<inT, outT, A, D, rmT, Tag>,
+	  bool CtValid = Eval::ct_valid>
+struct factory;
+
+/// In case the compile-time check passes, we decide at run-time whether
+/// or not to use this backend.
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag,
+	  typename Rest,
+	  typename Eval>
+struct factory<inT, outT, A, D, rmT, TagList, Tag, Rest, Eval, true>
+{
+  typedef fft::fftm<inT, outT, A == 0, D == -2> interface;
+  static std::auto_ptr<interface> 
+  create(vsip::Domain<2> const& dom, typename interface::scalar_type scale)
+  {
+    if (Eval::rt_valid(dom)) return Eval::create(dom, scale);
+    else return factory<inT, outT, A, D, rmT, Rest>::create(dom, scale);
+  }
+};
+
+/// In case the compile-time check fails, we continue the search
+/// directly at the next entry in the type list.
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag,
+	  typename Rest,
+	  typename Eval>
+struct factory<inT, outT, A, D, rmT, TagList, Tag, Rest, Eval, false>
+  : factory<inT, outT, A, D, rmT, Rest>
+{};
+
+/// Terminator. Instead of passing on to the next element
+/// it aborts the program. It is a program error to define
+/// callback lists that can't handle a given expression.
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT,
+	  typename TagList,
+	  typename Tag,
+	  typename Eval>
+struct factory<inT, outT, A, D, rmT, TagList, Tag, None_type, Eval, true>
+{
+  typedef fft::fftm<inT, outT, A == 0, D == -2> interface;
+  static std::auto_ptr<interface> 
+  create(Domain<2> const &dom, typename interface::scalar_type scale)
+  {
+    if (Eval::rt_valid(dom)) return Eval::create(dom, scale);
+    assert(0);
+    return std::auto_ptr<interface>();
+  }
+};
+
+} // namespace vsip::impl::fftm
+
+} // namespace vsip::impl
+} // namespace vsip
+
+#endif
Index: src/vsip/impl/fft/util.hpp
===================================================================
RCS file: src/vsip/impl/fft/util.hpp
diff -N src/vsip/impl/fft/util.hpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/fft/util.hpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,111 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/fft/util.cpp
+    @author  Stefan Seefeld
+    @date    2006-02-21
+    @brief   VSIPL++ Library: FFT common infrastructure used by all 
+    implementations.
+*/
+
+#ifndef VSIP_IMPL_FFT_UTIL_HPP
+#define VSIP_IMPL_FFT_UTIL_HPP
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/impl/fft/backend.hpp>
+#include <vsip/impl/fast-block.hpp>
+#include <vsip/impl/view_traits.hpp>
+#include <iostream>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace fft
+{
+
+/// Determine the direction (forward or inverse) of a given Fft
+/// from its parameters.
+template <typename inT, typename outT, int sD> struct direction;
+template <typename T, int sD>
+struct direction<T, std::complex<T>, sD> { static bool const value = true;};
+template <typename T, int sD>
+struct direction<std::complex<T>, T, sD> { static bool const value = false;};
+template <typename T>
+struct direction<T, T, -2> { static bool const value = true;};
+template <typename T>
+struct direction<T, T, -1> { static bool const value = false;};
+
+/// Determine the 'special dimension' of a real fft
+/// from its parameters.
+template <typename inT, typename outT, int sD> 
+struct axis { static int const value = sD;};
+template <typename T, int sD>
+struct axis<T, T, sD> { static int const value = 0;};
+
+/// Device to calculate the size of the input and output blocks
+/// for complex and real ffts.
+template <dimension_type D, typename inT, typename outT, int A>
+struct io_size
+{
+  static Domain<D> size(Domain<D> const &dom) { return dom;}
+};
+template <dimension_type D, typename T, int A>
+struct io_size<D, std::complex<T>, T, A>
+{
+  static Domain<D> size(Domain<D> const &dom) 
+  {
+    Domain<D> retn(dom);
+    Domain<1> &mod = retn.impl_at(A);
+    mod = Domain<1>(0, 1, mod.size() / 2 + 1); 
+    return retn;
+  }
+};
+
+template <typename View>
+View
+new_view(vsip::Domain<1> const& dom) { return View(dom.size());} 
+
+template <typename View>
+View 
+new_view(vsip::Domain<2> const& dom)
+{ return View(dom[0].size(), dom[1].size());}
+
+template <typename View>
+View  
+new_view(vsip::Domain<3> const& dom)
+{ return View(dom[0].size(), dom[1].size(), dom[2].size());}
+
+
+
+//
+template<typename T, typename BlockT>
+struct result
+{
+  static dimension_type const dim = BlockT::dim;
+  typedef typename
+  // FIXME: Allow cmplx_split, to match split input.
+  impl::Fast_block<dim, T, 
+		   Layout<dim, tuple<0,1,2>,
+			  impl::Stride_unit_dense,
+			  typename Block_layout<BlockT>::complex_type>,
+		   typename BlockT::map_type> block_type;
+
+  typedef typename View_of_dim<dim, T, block_type>::type view_type;
+
+  static view_type create(Domain<dim> const &dom)
+  { return new_view<view_type>(dom);}
+};
+
+} // namespace vsip::impl::fft
+} // namespace vsip::impl
+} // namespace vsip
+
+#endif
Index: src/vsip/impl/fft/workspace.hpp
===================================================================
RCS file: src/vsip/impl/fft/workspace.hpp
diff -N src/vsip/impl/fft/workspace.hpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/fft/workspace.hpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,240 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/fft/workspace.cpp
+    @author  Stefan Seefeld
+    @date    2006-02-21
+    @brief   VSIPL++ Library: FFT common infrastructure used by all 
+    implementations.
+*/
+
+#ifndef VSIP_IMPL_FFT_WORKSPACE_HPP
+#define VSIP_IMPL_FFT_WORKSPACE_HPP
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/impl/fft/backend.hpp>
+#include <vsip/impl/view_traits.hpp>
+#include <vsip/impl/adjust-layout.hpp>
+#include <iostream>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace fft
+{
+
+/// This provides the temporary data as well as the
+/// conversion logic from blocks to arrays as expected
+/// by fft backends.
+template <dimension_type D, typename inT, typename outT, dimension_type A> 
+class workspace;
+
+template <typename inT, typename outT>
+class workspace<1, inT, outT, 0>
+{
+public:
+  workspace(Domain<1> const &, Domain<1> const &) {}
+  
+  template <typename BE, typename Block0, typename Block1>
+  void by_reference(BE *backend, Block0 const &in, Block1 &out)
+  {
+    typedef typename Block_layout<Block0>::layout_type in_layout;
+    typedef typename Block_layout<Block1>::layout_type out_layout;
+
+//     static bool const is_split  =
+//       impl::Type_equal<typename impl::Block_layout<Block0>::complex_type,
+//                        impl::Cmplx_split_fmt>::value;
+    Ext_data<Block0, in_layout,No_count_policy,Copy_access_tag> 
+      in_ext(in, SYNC_IN);
+    Ext_data<Block1, out_layout,No_count_policy,Copy_access_tag>
+      out_ext(out, SYNC_OUT);
+    // If this is a real FFT we need to make sure we pass N, not N/2+1 as size.
+    length_type size = std::max(in_ext.size(0), out_ext.size(0));
+    if (in_ext.stride(0) == 1 && out_ext.stride(0) == 1)
+      backend->by_reference(in_ext.data(), in_ext.stride(0),
+			    out_ext.data(), out_ext.stride(0), size);
+  }
+
+  template <typename BE, typename BlockT>
+  void in_place(BE *backend, BlockT &inout)
+  {
+    vsip::impl::Ext_data<BlockT> inout_ext(inout, vsip::impl::SYNC_INOUT);
+    if (inout_ext.stride(0) == 1)
+      backend->in_place(inout_ext.data(), inout_ext.stride(0), inout_ext.size(0));
+  }
+private:
+  
+};
+
+/// workspace for column-wise FFTMs (and column-first 2D FFTs). As all backends
+/// support unit-stride in the major dimension, this is optimized for col-major
+/// storage.
+template <typename inT, typename outT>
+class workspace<2, inT, outT, 0>
+{
+  // TODO: Does this really have to be a block ?
+  // A raw array ought to be sufficient...
+  typedef Fast_block<2, inT,
+		     Layout<2, tuple<1,0,2>, Stride_unit_dense, Cmplx_inter_fmt>,
+		     Local_map> in_buffer_type;
+  typedef Fast_block<2, outT,
+		     Layout<2, tuple<1,0,2>, Stride_unit_dense, Cmplx_inter_fmt>,
+		     Local_map> out_buffer_type;
+
+public:
+  workspace(Domain<2> const &in, Domain<2> const &out)
+    : input_buffer_(in), output_buffer_(out) {}
+  
+  template <typename BE, typename Block0, typename Block1>
+  void by_reference(BE *backend, Block0 const &in, Block1 &out)
+  {
+    typedef typename Block_layout<Block0>::layout_type in_l;
+    typedef typename Block_layout<Block1>::layout_type out_l;
+
+    typedef Layout<2, tuple<1,0,2>, Stride_unit, typename in_l::complex_type>
+      in_trans_layout;
+    typedef Layout<2, tuple<1,0,2>, Stride_unit, typename out_l::complex_type>
+      out_trans_layout;
+
+    typedef typename Adjust_layout<typename Block0::value_type,
+      in_trans_layout, in_l>::type
+      in_layout;
+    typedef typename Adjust_layout<typename Block0::value_type,
+      out_trans_layout, out_l>::type
+      out_layout;
+
+    Ext_data<Block0, in_layout,No_count_policy,Copy_access_tag>
+      in_ext(in, SYNC_IN, Ext_data<in_buffer_type>(input_buffer_).data());
+    Ext_data<Block1, out_layout,No_count_policy,Copy_access_tag>
+      out_ext(out, SYNC_OUT, Ext_data<out_buffer_type>(output_buffer_).data());
+    // If this is a real FFT we need to make sure we pass N, not N/2+1 as size.
+    length_type rows = std::max(in_ext.size(0), out_ext.size(0));
+    length_type cols = std::max(in_ext.size(1), out_ext.size(1));
+    // These blocks are col-major, so we always accept them if their rows have
+    // unit-stride.
+    if (in_ext.stride(0) == 1 && out_ext.stride(0) == 1)
+      backend->by_reference(in_ext.data(), in_ext.stride(0), in_ext.stride(1),
+			    out_ext.data(), out_ext.stride(0), out_ext.stride(1),
+			    rows, cols);
+  }
+  template <typename BE, typename BlockT>
+  void in_place(BE *backend, BlockT &inout)
+  {
+    typedef typename Block_layout<BlockT>::layout_type l;
+    typedef Layout<2, tuple<1,0,2>, Stride_unit, typename l::complex_type>
+      trans_layout;
+    typedef typename Adjust_layout<typename BlockT::value_type,
+      trans_layout, l>::type
+      layout;
+    Ext_data<BlockT, layout, No_count_policy,Copy_access_tag> 
+      inout_ext(inout, SYNC_INOUT, Ext_data<in_buffer_type>(input_buffer_).data());
+    // This block is col-major, so we always accept it if its rows have
+    // unit-stride.
+    if (inout_ext.stride(0) == 1)
+      backend->in_place(inout_ext.data(), inout_ext.stride(0), inout_ext.stride(1),
+			inout_ext.size(0), inout_ext.size(1));
+  }
+private:
+  in_buffer_type input_buffer_;
+  out_buffer_type output_buffer_;
+};
+
+/// workspace for row-wise FFTMs (and row-first 2D FFTs). As all backends
+/// support unit-stride in the major dimension, this is optimized for row-major
+/// storage.
+template <typename inT, typename outT>
+class workspace<2, inT, outT, 1>
+{
+  typedef Fast_block<2, inT,
+		     Layout<2, tuple<0,1,2>, Stride_unit_dense, Cmplx_inter_fmt>,
+		     Local_map> in_buffer_type;
+  typedef Fast_block<2, outT,
+		     Layout<2, tuple<0,1,2>, Stride_unit_dense, Cmplx_inter_fmt>,
+		     Local_map> out_buffer_type;
+
+public:
+  workspace(Domain<2> const &in, Domain<2> const &out)
+    : input_buffer_(in), output_buffer_(out) {}
+  
+  template <typename BE, typename Block0, typename Block1>
+  void by_reference(BE *backend, Block0 const &in, Block1 &out)
+  {
+    typedef typename Block_layout<Block0>::layout_type in_l;
+    typedef typename Block_layout<Block1>::layout_type out_l;
+
+    typedef Layout<2, tuple<0,1,2>, Stride_unit, typename in_l::complex_type> 
+      in_trans_layout;
+    typedef Layout<2, tuple<0,1,2>, Stride_unit, typename out_l::complex_type> 
+      out_trans_layout;
+
+    typedef typename Adjust_layout<typename Block0::value_type,
+      in_trans_layout, in_l>::type
+      in_layout;
+    typedef typename Adjust_layout<typename Block0::value_type,
+      out_trans_layout, out_l>::type
+      out_layout;
+
+    Ext_data<Block0, in_layout,No_count_policy,Copy_access_tag> 
+      in_ext(in, SYNC_IN, Ext_data<in_buffer_type>(input_buffer_).data());
+    Ext_data<Block1, out_layout,No_count_policy,Copy_access_tag> 
+      out_ext(out, SYNC_OUT, Ext_data<out_buffer_type>(output_buffer_).data());
+    // If this is a real FFT we need to make sure we pass N, not N/2+1 as size.
+    length_type rows = std::max(in_ext.size(0), out_ext.size(0));
+    length_type cols = std::max(in_ext.size(1), out_ext.size(1));
+    if (in_ext.stride(1) == 1 && out_ext.stride(1) == 1)
+      backend->by_reference(in_ext.data(), in_ext.stride(0), in_ext.stride(1),
+			    out_ext.data(), out_ext.stride(0), out_ext.stride(1),
+			    rows, cols);
+  }
+  template <typename BE, typename BlockT>
+  void in_place(BE *backend, BlockT &inout)
+  {
+    typedef typename Block_layout<BlockT>::layout_type l;
+    typedef Layout<2, tuple<0,1,2>, Stride_unit, typename l::complex_type>
+      trans_layout;
+    typedef typename Adjust_layout<typename BlockT::value_type,
+      trans_layout, l>::type
+      layout;
+    Ext_data<BlockT, layout,No_count_policy,Copy_access_tag>
+      inout_ext(inout, SYNC_INOUT, Ext_data<in_buffer_type>(input_buffer_).data());
+    if (inout_ext.stride(1) == 1)
+      backend->in_place(inout_ext.data(), inout_ext.stride(0), inout_ext.stride(1),
+			inout_ext.size(0), inout_ext.size(1));
+  }
+
+private:
+  in_buffer_type input_buffer_;
+  out_buffer_type output_buffer_;
+};
+
+template <typename inT, typename outT, dimension_type A>
+class workspace<3, inT, outT, A>
+{
+public:
+  workspace(Domain<3> const &, Domain<3> const &) {}
+  
+  template <typename BE, typename Block0, typename Block1>
+  void by_reference(BE *backend, Block0 const &in, Block1 &out)
+  {
+  }
+  template <typename BE, typename BlockT>
+  void in_place(BE *backend, BlockT &in)
+  {
+  }
+private:
+  
+};
+
+} // namespace vsip::impl::fft
+} // namespace vsip::impl
+} // namespace vsip
+
+#endif
Index: src/vsip/impl/fftw3/fft.cpp
===================================================================
RCS file: src/vsip/impl/fftw3/fft.cpp
diff -N src/vsip/impl/fftw3/fft.cpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/fftw3/fft.cpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,36 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/fftw3/fft.cpp
+    @author  Stefan Seefeld
+    @date    2006-03-06
+    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with 
+             FFTW3.
+*/
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/domain.hpp>
+#include <vsip/impl/fft/backend.hpp>
+#include <vsip/impl/fft/util.hpp>
+#include <vsip/impl/fftw3/fft.hpp>
+#include <vsip/impl/fns_scalar.hpp>
+# include <fftw3.h>
+#include <complex>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace fftw3
+{
+
+} // namespace vsip::impl::fftw3
+} // namespace vsip::impl
+} // namespace vsip
Index: src/vsip/impl/fftw3/fft.hpp
===================================================================
RCS file: src/vsip/impl/fftw3/fft.hpp
diff -N src/vsip/impl/fftw3/fft.hpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/fftw3/fft.hpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,96 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/ipp/fft.hpp
+    @author  Stefan Seefeld
+    @date    2006-03-06
+    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with FFTW3.
+*/
+
+#ifndef VSIP_IMPL_FFTW3_FFT_HPP
+#define VSIP_IMPL_FFTW3_FFT_HPP
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/domain.hpp>
+#include <vsip/impl/fft/factory.hpp>
+#include <vsip/impl/fft/util.hpp>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace ipp
+{
+
+/// These are the entry points into the IPP FFT bridge.
+template <typename I, dimension_type D, typename S>
+std::auto_ptr<I>
+create(Domain<D> const &dom, S scale);
+
+} // namespace vsip::impl::sal
+
+namespace fft
+{
+struct Fftw3_tag;
+
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<Dim, inT, outT, sD, rmT, Fftw3_tag>
+{
+  static bool const ct_valid = false;
+  static bool rt_valid(Domain<Dim> const &dom)
+  {
+    return false;
+  }
+//   static std::auto_ptr<backend<Dim, inT, outT,
+// 			       axis<inT, outT, sD>::value,
+// 			       direction<inT, outT, sD>::value> >
+//   create(Domain<Dim> const &dom, typename impl::Scalar_of<inT>::type scale)
+//   {
+//     return sal::create<backend<Dim, inT, outT,
+//                                axis<inT, outT, sD>::value,
+//                                direction<inT, outT, sD>::value> >
+//       (dom, scale);
+//   }
+};
+
+} // namespace vsip::impl::fft
+
+namespace fftm
+{
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<inT, outT, A, D, rmT, fft::Fftw3_tag>
+{
+  // TODO: Implement
+  static bool const ct_valid = false;
+  static bool rt_valid(Domain<2> const &dom)
+  {
+    return false;
+  }
+//   static std::auto_ptr<fft::fftm<inT, outT, A == 0, D == -2> > 
+//   create(Domain<2> const &dom, typename impl::Scalar_of<inT>::type scale)
+//   {
+//     return sal::create<fft::fftm<inT, outT, A == 0, D == -2> >(dom, scale);
+//   }
+};
+
+} // namespace vsip::impl::fftm
+} // namespace vsip::impl
+} // namespace vsip
+
+#endif
+
Index: src/vsip/impl/ipp/fft.cpp
===================================================================
RCS file: src/vsip/impl/ipp/fft.cpp
diff -N src/vsip/impl/ipp/fft.cpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/ipp/fft.cpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,37 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/ipp/fft.cpp
+    @author  Stefan Seefeld
+    @date    2006-03-06
+    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with 
+             Intel's IPP.
+*/
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/domain.hpp>
+#include <vsip/impl/fft/backend.hpp>
+#include <vsip/impl/fft/util.hpp>
+#include <vsip/impl/ipp/fft.hpp>
+#include <vsip/impl/fns_scalar.hpp>
+# include <ipps.h>
+# include <ippi.h>
+#include <complex>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace ipp
+{
+
+} // namespace vsip::impl::ipp
+} // namespace vsip::impl
+} // namespace vsip
Index: src/vsip/impl/ipp/fft.hpp
===================================================================
RCS file: src/vsip/impl/ipp/fft.hpp
diff -N src/vsip/impl/ipp/fft.hpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/ipp/fft.hpp	7 Mar 2006 03:55:20 -0000
@@ -0,0 +1,97 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/ipp/fft.hpp
+    @author  Stefan Seefeld
+    @date    2006-03-06
+    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with 
+             Intel's IPP.
+*/
+
+#ifndef VSIP_IMPL_IPP_FFT_HPP
+#define VSIP_IMPL_IPP_FFT_HPP
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/domain.hpp>
+#include <vsip/impl/fft/factory.hpp>
+#include <vsip/impl/fft/util.hpp>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace ipp
+{
+
+/// These are the entry points into the IPP FFT bridge.
+template <typename I, dimension_type D, typename S>
+std::auto_ptr<I>
+create(Domain<D> const &dom, S scale);
+
+} // namespace vsip::impl::sal
+
+namespace fft
+{
+struct Intel_ipp_tag;
+
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<Dim, inT, outT, sD, rmT, Intel_ipp_tag>
+{
+  static bool const ct_valid = false;
+  static bool rt_valid(Domain<Dim> const &dom)
+  {
+    return false;
+  }
+//   static std::auto_ptr<backend<Dim, inT, outT,
+// 			       axis<inT, outT, sD>::value,
+// 			       direction<inT, outT, sD>::value> >
+//   create(Domain<Dim> const &dom, typename impl::Scalar_of<inT>::type scale)
+//   {
+//     return sal::create<backend<Dim, inT, outT,
+//                                axis<inT, outT, sD>::value,
+//                                direction<inT, outT, sD>::value> >
+//       (dom, scale);
+//   }
+};
+
+} // namespace vsip::impl::fft
+
+namespace fftm
+{
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<inT, outT, A, D, rmT, fft::Intel_ipp_tag>
+{
+  // TODO: Implement
+  static bool const ct_valid = false;
+  static bool rt_valid(Domain<2> const &dom)
+  {
+    return false;
+  }
+//   static std::auto_ptr<fft::fftm<inT, outT, A == 0, D == -2> > 
+//   create(Domain<2> const &dom, typename impl::Scalar_of<inT>::type scale)
+//   {
+//     return sal::create<fft::fftm<inT, outT, A == 0, D == -2> >(dom, scale);
+//   }
+};
+
+} // namespace vsip::impl::fftm
+} // namespace vsip::impl
+} // namespace vsip
+
+#endif
+
Index: src/vsip/impl/sal/fft.cpp
===================================================================
RCS file: src/vsip/impl/sal/fft.cpp
diff -N src/vsip/impl/sal/fft.cpp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ src/vsip/impl/sal/fft.cpp	7 Mar 2006 03:55:21 -0000
@@ -0,0 +1,1050 @@
+/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
+
+/** @file    vsip/impl/sal/fft.cpp
+    @author  Stefan Seefeld
+    @date    2006-02-20
+    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with 
+             Mercury's SAL.
+*/
+
+/***********************************************************************
+  Included Files
+***********************************************************************/
+
+#include <vsip/support.hpp>
+#include <vsip/domain.hpp>
+#include <vsip/impl/fft/backend.hpp>
+#include <vsip/impl/fft/util.hpp>
+#include <vsip/impl/sal/fft.hpp>
+#include <vsip/impl/fns_scalar.hpp>
+#include <sal.h>
+#include <complex>
+
+/***********************************************************************
+  Declarations
+***********************************************************************/
+
+namespace vsip
+{
+namespace impl
+{
+namespace sal
+{
+
+/// Compare two floating-point values for equality.
+///
+/// Algorithm from:
+///    www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
+
+template <typename T>
+bool
+almost_equal(T A, T B, T rel_epsilon = 1e-4, T abs_epsilon = 1e-6)
+{
+  if (fn::mag(A - B) < abs_epsilon)
+    return true;
+
+  T relative_error;
+
+  if (fn::mag(B) > fn::mag(A))
+    relative_error = fn::mag((A - B) / B);
+  else
+    relative_error = fn::mag((B - A) / A);
+
+  return (relative_error <= rel_epsilon);
+}
+
+
+// TODO: figure out what exactly this ESAL flag is and tune it.
+//       (requires SAL, not CSAL, and real h/w)
+long const ESAL = 0;
+
+inline unsigned long
+ilog2(length_type size)    // assume size = 2^n, != 0, return n.
+{
+  unsigned int n = 0;
+  while (size >>= 1) ++n;
+  return n;
+}
+
+inline length_type
+get_sizes(Domain<1> const &dom, length_type *size, length_type *l2)
+{
+  *size = dom.size();
+  *l2 = ilog2(*size);
+  return *l2;
+}
+inline length_type
+get_sizes(Domain<2> const &dom, length_type *size, length_type *l2)
+{
+  size[0] = dom[0].size();
+  size[1] = dom[1].size();
+  l2[0] = ilog2(size[0]);
+  l2[1] = ilog2(size[1]);
+  return std::max(l2[0], l2[1]);
+}
+inline length_type
+get_sizes(Domain<3> const &dom, length_type *size, length_type *l2)
+{
+  size[0] = dom[0].size();
+  size[1] = dom[1].size();
+  size[2] = dom[2].size();
+  l2[0] = ilog2(size[0]);
+  l2[1] = ilog2(size[1]);
+  l2[2] = ilog2(size[2]);
+  return std::max(l2[0], std::max(l2[1], l2[2]));
+}
+
+
+template <typename T>
+inline void 
+unpack(T *data, unsigned long N, unsigned long M)
+{
+  // unpack the data (see SAL reference, figure 3.6, for details).
+  unsigned long const M2 = M/2 + 1;
+  T t10r = data[N];         // (1, 0).real()
+  T t10i = data[N + 1];     // (1, 0).imag()
+  data[N + 1] = 0.;         // (0, 0).imag()
+  data[N - 2] = data[1];    // (0,-1).real()
+  data[N - 1] = 0.;         // (0,-1).imag()
+  data[1] = 0.;
+  for (unsigned long r = 1; r != M2 - 1; ++r)
+  {
+    // set last row (r,-1)
+    data[(r + 1) * N - 2] = data[2 * r * N + 1];
+    data[(r + 1) * N - 1] = data[(2 * r + 1) * N + 1];
+    data[2 * r * N + 1] = 0.;
+    data[(2 * r + 1) * N + 1] = 0.;
+    // set first row (r, 0)
+    data[r * N] = data[2 * r * N];
+    data[r * N + 1] = data[(2 * r + 1) * N];
+    data[2 * r * N] = 0.;
+    data[(2 * r + 1) * N] = 0.;
+  }
+  data[(M2 - 1) * N] = t10r;
+  data[(M2 - 1) * N + 1] = 0.;
+  data[M2  * N - 2] = t10i;
+  data[M2  * N - 1] = 0;
+
+  // Now fill in the missing cells by symmetry.
+  for (unsigned long r = M2; r != M; ++r)
+  {
+    // first column (r, 0)
+    data[r * N] = data[(M - r) * N];
+    data[r * N + 1] = - data[(M - r) * N + 1];
+    // last column (r, -1)
+    data[(r + 1) * N - 2] = data[(M - r + 1) * N - 2];
+    data[(r + 1) * N - 1] = - data[(M - r + 1) * N - 1];
+  }
+}
+
+template <typename T>
+inline void 
+pack(T *data, unsigned long N, unsigned long M)
+{
+  unsigned long const M2 = M/2 + 1;
+
+  T t10i = data[M2  * N - 2];
+  T t10r = data[(M2 - 1) * N];
+
+  for (unsigned long r = M2 - 2; r; --r)
+  {
+    // pack first row (r, 0)
+    data[2 * r * N] = data[r * N];
+    data[(2 * r + 1) * N] = data[r * N + 1];
+    // pack last row (r,-1)
+    data[2 * r * N + 1] = data[(r + 1) * N - 2];
+    data[(2 * r + 1) * N + 1] = data[(r + 1) * N - 1];
+  }
+
+  data[N] = t10r;         // (1, 0).real()
+  data[N + 1] = t10i;     // (1, 0).imag()
+  data[1] = data[N - 2];  // (0,-1).real()
+}
+
+template <dimension_type D, bool single> struct fft_base;
+
+template <dimension_type D> struct fft_base<D, true /* single precision */>
+{
+  typedef float rtype;
+  typedef COMPLEX ctype;
+  typedef COMPLEX_SPLIT ztype;
+
+  fft_base(Domain<D> const &dom, long options, rtype scale)
+    : scale_(scale)
+  {
+    length_type size = get_sizes(dom, size_, l2size_);
+    unsigned long nbytes = 0;
+    fft_setup(size, options, &setup_, &nbytes);
+    buffer_ = alloc_align<ctype>(32, dom.size());
+  }
+  ~fft_base() 
+  {
+    free_align(buffer_);
+    fft_free(&setup_);
+  }
+  
+  void scale(std::complex<rtype> *data, length_type size, rtype s)
+  {
+    rtype *d = reinterpret_cast<rtype*>(data);
+    vsmulx(d, 1, &s, d, 1, 2 * size, ESAL);
+  }
+  void scale(rtype *data, length_type size, rtype s)
+  {
+    vsmulx(data, 1, &s, data, 1, size, ESAL);
+  }
+
+  void cip(std::complex<rtype> *data, long dir)
+  {
+    fft_cipx(&setup_, reinterpret_cast<ctype *>(data), 2, l2size_[0], dir, ESAL);
+  }
+  void zip(std::pair<rtype*, rtype*> inout, stride_type stride, long dir)
+  {
+    ztype data = {inout.first, inout.second};
+    ztype tmp = {reinterpret_cast<rtype*>(buffer_),
+		 reinterpret_cast<rtype*>(buffer_) + size_[0]};
+    fft_ziptx(&setup_, &data, stride, &tmp, l2size_[0], dir, ESAL);
+  }
+  void cip2d(std::complex<rtype> *data, dimension_type axis, long dir)
+  {
+    fft2d_cipx(&setup_, reinterpret_cast<ctype *>(data), 2, 0,
+	       l2size_[axis], l2size_[1 - axis], dir, ESAL);
+  }
+  void rop(rtype *in, std::complex<rtype> *out)
+  {
+    rtype *data = reinterpret_cast<rtype*>(out);
+    fft_roptx(&setup_, in, 1, data, 1, reinterpret_cast<rtype *>(buffer_), 
+	     l2size_[0], FFT_FORWARD, ESAL);
+    // unpack the data (see SAL reference for details).
+    int const N = size_[0] + 2;
+    data[N - 2] = data[1];
+    data[1] = 0.f;
+    data[N - 1] = 0.f;
+  }
+  void rop(std::complex<rtype> *in, rtype *out)
+  {
+    rtype *data = reinterpret_cast<rtype*>(in);
+    // pack the data (see SAL reference for details).
+    int const N = size_[0] + 2;
+    data[1] = data[N - 2];
+    data[N - 2] = data[N - 1] = 0.f;
+    fft_ropx(&setup_, data, 1, out, 1, l2size_[0], FFT_INVERSE, ESAL);
+  }
+  void cop(std::complex<rtype> *in_arg, std::complex<rtype> *out_arg, long dir)
+  {
+    ctype *in = reinterpret_cast<ctype*>(in_arg);
+    ctype *out = reinterpret_cast<ctype*>(out_arg);
+    fft_coptx(&setup_, in, 2, out, 2, buffer_, l2size_[0], dir, ESAL);
+  }
+  void zop(std::pair<rtype*, rtype*> in_arg, stride_type in_stride,
+	   std::pair<rtype*, rtype*> out_arg, stride_type out_stride, long dir)
+  {
+    ztype in = {in_arg.first, in_arg.second};
+    ztype out = {out_arg.first, out_arg.second};
+    ztype tmp = {reinterpret_cast<rtype*>(buffer_),
+		 reinterpret_cast<rtype*>(buffer_) + size_[0]};
+    fft_zoptx(&setup_, &in, in_stride, &out, out_stride,
+	      &tmp, l2size_[0], dir, ESAL);
+  }
+  void rop2d(rtype *in, stride_type in_stride,
+	     std::complex<rtype> *out_arg, stride_type out_stride,
+	     dimension_type axis)
+  {
+    rtype *out = reinterpret_cast<rtype*>(out_arg);
+    fft2d_roptx(&setup_, in, 1, in_stride, out, 1, 2 * out_stride,
+		reinterpret_cast<rtype*>(buffer_),
+		l2size_[axis], l2size_[1 - axis], FFT_FORWARD, ESAL);
+    // unpack the data (see SAL reference, figure 3.6, for details).
+    unpack(out, size_[axis] + 2, size_[1 - axis]);
+  }
+  void rop2d(std::complex<rtype> *in_arg, stride_type in_stride,
+	     rtype *out, stride_type out_stride, dimension_type axis)
+  {
+    rtype *in = reinterpret_cast<rtype*>(in_arg);
+    // pack the data (see SAL reference, figure 3.6, for details).
+    pack(in, size_[axis] + 2, size_[1 - axis]);
+    fft2d_ropx(&setup_, in, 1, 2 * in_stride, out, 1, out_stride,
+	       l2size_[axis], l2size_[1 - axis], FFT_INVERSE, ESAL);
+  }
+  void cop2d(std::complex<rtype> *in_arg, stride_type in_stride,
+	     std::complex<rtype> *out_arg, stride_type out_stride, long dir)
+  {
+    ctype *in = reinterpret_cast<ctype*>(in_arg);
+    ctype *out = reinterpret_cast<ctype*>(out_arg);
+    fft2d_coptx(&setup_, in, 2, 2 * in_stride, out, 2, 2 * out_stride,
+		reinterpret_cast<ctype*>(buffer_),
+		l2size_[1], l2size_[0], dir, ESAL);
+  }
+  void zop2d(std::pair<rtype*, rtype*> in_arg,
+	     stride_type in_r_stride, stride_type in_c_stride,
+	     std::pair<rtype*, rtype*> out_arg,
+	     stride_type out_r_stride, stride_type out_c_stride, long dir)
+  {
+    ztype in = {in_arg.first, in_arg.second};
+    ztype out = {out_arg.first, out_arg.second};
+    ztype tmp = {reinterpret_cast<rtype*>(buffer_),
+		 reinterpret_cast<rtype*>(buffer_) + size_[0]};
+    fft2d_zoptx(&setup_, &in, in_r_stride, in_c_stride,
+		&out, out_r_stride, out_c_stride,
+		&tmp, l2size_[1], l2size_[0], dir, ESAL);
+  }
+  void cipm(std::complex<rtype> *inout, stride_type stride,
+	    dimension_type axis, long dir)
+  {
+    fftm_cipx(&setup_, reinterpret_cast<ctype *>(inout),
+	      2, 2 * stride, l2size_[1 - axis], size_[axis], dir, ESAL);
+  }
+  void zipm(std::pair<rtype*, rtype*> inout,
+	    stride_type r_stride, stride_type c_stride,
+	    dimension_type axis, long dir)
+  {
+    ztype data = {inout.first, inout.second};
+    fftm_zipx(&setup_, &data, r_stride, c_stride,
+	      l2size_[1 - axis], size_[axis], dir, ESAL);
+  }
+  void ropm(rtype *in, stride_type in_stride,
+	    std::complex<rtype> *out_arg, stride_type out_stride,
+	    dimension_type axis)
+  {
+    rtype *out = reinterpret_cast<rtype*>(out_arg);
+    fftm_roptx(&setup_, in, 1, in_stride,
+	       out, 1, in_stride + 2,
+	       reinterpret_cast<rtype*>(buffer_),
+	       l2size_[1], size_[0], FFT_FORWARD, ESAL);
+    // Unpack the data (see SAL reference for details), and scale back by 1/2.
+    int const N = size_[1] + 2;
+    rtype scale = scale_ * 0.5f;
+    for (unsigned int i = 0; i != size_[0]; ++i, out += in_stride + 2)
+    {
+      out[N - 2] = out[1];
+      out[N - 1] = 0.;
+      out[1] = 0.f;
+      vsmulx(out, 1, &scale, out, 1, N, ESAL);
+    }
+  }
+  void ropm(std::complex<rtype> *in_arg, stride_type in_stride,
+	    rtype *out, stride_type out_stride,
+	    dimension_type axis)
+  {
+    rtype *in = reinterpret_cast<rtype*>(in_arg);
+    // Pack the data (see SAL reference for details).
+    int const N = size_[1] + 2;
+    for (unsigned int i = 0; i != size_[0]; ++i, in += in_stride + 2)
+    {
+      in[1] = in[N - 2];
+    }
+    in = reinterpret_cast<rtype*>(in_arg);
+    fftm_ropx(&setup_, in, 1, in_stride, out, 1, out_stride + 2,
+	      l2size_[axis], size_[1 - axis], FFT_INVERSE, ESAL);
+  }
+  void copm(std::complex<rtype> *in, stride_type in_stride,
+	    std::complex<rtype> *out, stride_type out_stride,
+	    dimension_type axis, long dir)
+  {
+    fftm_coptx(&setup_, reinterpret_cast<ctype *>(in), 2, 2 * in_stride,
+	       reinterpret_cast<ctype *>(out), 2, 2 * out_stride,
+	       reinterpret_cast<ctype *>(buffer_),
+	       l2size_[axis], size_[1 - axis], dir, ESAL);
+  }
+
+  FFT_setup setup_;
+  length_type size_[D];
+  length_type l2size_[D];
+  rtype scale_;
+  ctype *buffer_;
+};
+
+template <dimension_type D> struct fft_base<D, false /* single precision */>
+{
+  typedef double rtype;
+  typedef DOUBLE_COMPLEX ctype;
+  typedef DOUBLE_COMPLEX_SPLIT ztype;
+
+  fft_base(Domain<D> const &dom, long options, rtype scale)
+    : scale_(scale)
+  {
+    length_type size = get_sizes(dom, size_, l2size_);
+    unsigned long nbytes = 0;
+    fft_setupd(size, options, &setup_, &nbytes);
+    buffer_ = alloc_align<ctype>(32, dom.size());
+  }
+  ~fft_base() 
+  {
+    free_align(buffer_);
+    fft_freed(&setup_);
+  }
+  
+  void scale(std::complex<rtype> *data, length_type size, rtype s)
+  {
+    rtype *d = reinterpret_cast<rtype*>(data);
+    vsmuldx(d, 1, &s, d, 1, 2 * size, ESAL);
+  }
+  void scale(rtype *data, length_type size, rtype s)
+  {
+    vsmuldx(data, 1, &s, data, 1, size, ESAL);
+  }
+
+  void cip(std::complex<rtype> *data, long dir)
+  {
+    fft_cipdx(&setup_, reinterpret_cast<ctype *>(data), 2, l2size_[0], dir, ESAL);
+  }
+  void zip(std::pair<rtype*, rtype*> inout, stride_type stride, long dir)
+  {
+    ztype data = {inout.first, inout.second};
+    ztype tmp = {reinterpret_cast<rtype*>(buffer_),
+		 reinterpret_cast<rtype*>(buffer_) + size_[0]};
+    fft_ziptdx(&setup_, &data, stride, &tmp, l2size_[0], dir, ESAL);
+  }
+  void cip2d(std::complex<rtype> *data, dimension_type axis, long dir)
+  {
+    fft2d_cipdx(&setup_, reinterpret_cast<ctype *>(data), 2, 0,
+		l2size_[1 - axis], l2size_[axis], dir, ESAL);
+  }
+  void rop(rtype *in, std::complex<rtype> *out)
+  {
+    rtype *data = reinterpret_cast<rtype*>(out);
+    fft_roptdx(&setup_, in, 1, data, 1, reinterpret_cast<rtype *>(buffer_),
+	       l2size_[0], FFT_FORWARD, ESAL);
+    // unpack the data (see SAL reference for details).
+    int const N = size_[0] + 2;
+    data[N - 2] = data[1];
+    data[1] = 0.f;
+    data[N - 1] = 0.f;
+  }
+  void rop(std::complex<rtype> *in, rtype *out)
+  {
+    rtype *data = reinterpret_cast<rtype*>(in);
+    // pack the data (see SAL reference for details).
+    int const N = size_[0] + 2;
+    data[1] = data[N - 2];
+    data[N - 2] = data[N - 1] = 0.f;
+
+    fft_roptdx(&setup_, data, 1, out, 1, reinterpret_cast<rtype *>(buffer_),
+	       l2size_[0], FFT_INVERSE, ESAL);
+  }
+  void cop(std::complex<rtype> *in_arg, std::complex<rtype> *out_arg, long dir)
+  {
+    ctype *in = reinterpret_cast<ctype*>(in_arg);
+    ctype *out = reinterpret_cast<ctype*>(out_arg);
+    fft_copdx(&setup_, in, 2, out, 2, l2size_[0], dir, ESAL);
+  }
+  void zop(std::pair<rtype*, rtype*> in_arg, stride_type in_stride,
+	   std::pair<rtype*, rtype*> out_arg, stride_type out_stride, long dir)
+  {
+    ztype in = {in_arg.first, in_arg.second};
+    ztype out = {out_arg.first, out_arg.second};
+    ztype tmp = {reinterpret_cast<rtype*>(buffer_),
+		 reinterpret_cast<rtype*>(buffer_) + size_[0]};
+    fft_zoptdx(&setup_, &in, in_stride, &out, out_stride,
+	       &tmp, l2size_[0], dir, ESAL);
+  }
+  void rop2d(rtype *in, stride_type in_stride,
+	     std::complex<rtype> *out_arg, stride_type out_stride,
+	     dimension_type axis)
+  {
+    rtype *out = reinterpret_cast<rtype*>(out_arg);
+    fft2d_roptdx(&setup_, in, 1, in_stride, out, 1, 2 * out_stride,
+		 reinterpret_cast<rtype*>(buffer_),
+		 l2size_[axis], l2size_[1 - axis], FFT_FORWARD, ESAL);
+    // unpack the data (see SAL reference, figure 3.6, for details).
+    unpack(out, size_[axis] + 2, size_[1 - axis]);
+  }
+  void rop2d(std::complex<rtype> *in_arg, stride_type in_stride,
+	     rtype *out, stride_type out_stride, dimension_type axis)
+  {
+    rtype *in = reinterpret_cast<rtype*>(in_arg);
+    // pack the data (see SAL reference, figure 3.6, for details).
+    pack(in, size_[axis] + 2, size_[1 - axis]);
+    fft2d_ropdx(&setup_, in, 1, 2 * in_stride, out, 1, out_stride,
+		l2size_[axis], l2size_[1 - axis], FFT_INVERSE, ESAL);
+  }
+  void cop2d(std::complex<rtype> *in_arg, stride_type in_r_stride,
+	     std::complex<rtype> *out_arg, stride_type out_r_stride, long dir)
+  {
+    ctype *in = reinterpret_cast<ctype*>(in_arg);
+    ctype *out = reinterpret_cast<ctype*>(out_arg);
+    fft2d_coptdx(&setup_, in, 2, 2 * in_r_stride, out, 2, 2 * out_r_stride,
+		 reinterpret_cast<ctype*>(buffer_),
+		 l2size_[1], l2size_[0], dir, ESAL);
+  }
+  void zop2d(std::pair<rtype*, rtype*> in_arg,
+	     stride_type in_r_stride, stride_type in_c_stride,
+	     std::pair<rtype*, rtype*> out_arg,
+	     stride_type out_r_stride, stride_type out_c_stride, long dir)
+  {
+    ztype in = {in_arg.first, in_arg.second};
+    ztype out = {out_arg.first, out_arg.second};
+    ztype tmp = {reinterpret_cast<rtype*>(buffer_),
+		 reinterpret_cast<rtype*>(buffer_) + size_[0]};
+    fft2d_zoptdx(&setup_, &in, in_r_stride, in_c_stride,
+		 &out, out_r_stride, out_c_stride,
+		 &tmp, l2size_[1], l2size_[0], dir, ESAL);
+  }
+  void cipm(std::complex<rtype> *inout, stride_type stride,
+	    dimension_type axis, long dir)
+  {
+    fftm_cipdx(&setup_, reinterpret_cast<ctype *>(inout),
+	       2, 2 * stride, l2size_[1 - axis], size_[axis], dir, ESAL);
+  }
+  void zipm(std::pair<rtype*, rtype*> inout,
+	    stride_type r_stride, stride_type c_stride,
+	    dimension_type axis, long dir)
+  {
+    ztype data = {inout.first, inout.second};
+    fftm_zipdx(&setup_, &data, r_stride, c_stride,
+	       l2size_[1 - axis], size_[axis], dir, ESAL);
+  }
+  void ropm(rtype *in, stride_type in_stride,
+	    std::complex<rtype> *out_arg, stride_type out_stride,
+	    dimension_type axis)
+  {
+    rtype *out = reinterpret_cast<rtype*>(out_arg);
+    fftm_roptdx(&setup_, in, 1, in_stride,
+		out, 1, in_stride + 2,
+		reinterpret_cast<rtype*>(buffer_),
+		l2size_[1], size_[0], FFT_FORWARD, ESAL);
+    // Unpack the data (see SAL reference for details), and scale back by 1/2.
+    int const N = size_[1] + 2;
+    rtype scale = scale_ * 0.5f;
+    for (unsigned int i = 0; i != size_[0]; ++i, out += in_stride + 2)
+    {
+      out[N - 2] = out[1];
+      out[N - 1] = 0.;
+      out[1] = 0.f;
+      vsmuldx(out, 1, &scale, out, 1, N, ESAL);
+    }
+  }
+  void ropm(std::complex<rtype> *in_arg, stride_type in_stride,
+	    rtype *out, stride_type out_stride,
+	    dimension_type axis)
+  {
+    rtype *in = reinterpret_cast<rtype*>(in_arg);
+    // Pack the data (see SAL reference for details).
+    int const N = size_[1] + 2;
+    for (unsigned int i = 0; i != size_[0]; ++i, in += in_stride + 2)
+    {
+      in[1] = in[N - 2];
+    }
+    in = reinterpret_cast<rtype*>(in_arg);
+    fftm_ropdx(&setup_, in, 1, in_stride, out, 1, in_stride + 2,
+	       l2size_[axis], size_[1 - axis], FFT_INVERSE, ESAL);
+  }
+  void copm(std::complex<rtype> *in, stride_type in_stride,
+	    std::complex<rtype> *out, stride_type out_stride,
+	    dimension_type axis, long dir)
+  {
+    fftm_coptdx(&setup_, reinterpret_cast<ctype *>(in), 2, 2 * in_stride,
+		reinterpret_cast<ctype *>(out), 2, 2 * out_stride,
+		reinterpret_cast<ctype *>(buffer_),
+		l2size_[axis], size_[1 - axis], dir, ESAL);
+  }
+
+  FFT_setupd setup_;
+  length_type size_[D];
+  length_type l2size_[D];
+  rtype scale_;
+  ctype *buffer_;
+};
+
+template <typename T> struct precision;
+template <> struct precision<float> { static bool const single = true;};
+template <> struct precision<double> { static bool const single = false;};
+
+template <dimension_type D, typename inT, typename outT,
+	  int Axis, bool Fwd> class impl;
+
+template <typename T, int Axis, bool Fwd>
+class impl<1, std::complex<T>, std::complex<T>, Axis, Fwd>
+  : private fft_base<1, precision<T>::single>,
+    public fft::backend<1, std::complex<T>, std::complex<T>, Axis, Fwd>
+{
+public:
+  impl(Domain<1> const &dom, T scale)
+    : fft_base<1, precision<T>::single>(dom, 0, scale) 
+  {
+  }
+
+  virtual void in_place(std::complex<T> *data,
+			stride_type stride, length_type size)
+  {
+    assert(stride == 1);
+    assert(size == this->size_[0]);
+    cip(data, Fwd ? FFT_FORWARD : FFT_INVERSE);
+    if (!almost_equal(this->scale_, T(1.)))
+      scale(data, this->size_[0], this->scale_);
+  }
+
+  virtual void in_place(std::pair<T *, T *> data,
+			stride_type stride, length_type size)
+  {
+    assert(size == this->size_[0]);
+    zip(data, stride, Fwd ? FFT_FORWARD : FFT_INVERSE);
+    if (!almost_equal(this->scale_, T(1.)))
+    {
+      scale(data.first, this->size_[0], this->scale_);
+      scale(data.second, this->size_[0], this->scale_);
+    }
+  }
+
+  virtual void by_reference(std::complex<T> *in, stride_type in_stride,
+			    std::complex<T> *out, stride_type out_stride,
+			    length_type size)
+  {
+    assert(in_stride == 1 && out_stride == 1);
+    assert(size == this->size_[0]);
+    cop(in, out, Fwd ? FFT_FORWARD : FFT_INVERSE);
+    if (!almost_equal(this->scale_, T(1.)))
+      scale(out, this->size_[0], this->scale_);
+  }
+  virtual void by_reference(std::pair<T *, T *> in, stride_type in_stride,
+			    std::pair<T *, T *> out, stride_type out_stride,
+			    length_type size)
+  {
+    assert(size == this->size_[0]);
+    zop(in, in_stride, out, out_stride, Fwd ? FFT_FORWARD : FFT_INVERSE);
+    if (!almost_equal(this->scale_, T(1.)))
+    {
+      scale(out.first, this->size_[0], this->scale_);
+      scale(out.second, this->size_[0], this->scale_);
+    }
+  }
+};
+
+template <typename T, int Axis>
+class impl<1, T, std::complex<T>, Axis, true>
+  : private fft_base<1, precision<T>::single>,
+    public fft::backend<1, T, std::complex<T>, Axis, true>
+{
+public:
+  impl(Domain<1> const &dom, T scale)
+    : fft_base<1, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void by_reference(T *in, stride_type in_stride,
+			    std::complex<T> *out, stride_type out_stride,
+			    length_type size)
+  {
+    assert(in_stride == 1 && out_stride == 1);
+    rop(in, out);
+    T s = this->scale_ * 0.5;
+    if (!almost_equal(s, T(1.)))
+      scale(out, this->size_[0]/2 + 1, s);
+  }
+  virtual void by_reference(T *in, stride_type in_stride,
+			    std::pair<T *, T *> out, stride_type out_stride,
+			    length_type size)
+  {
+  }
+};
+
+template <typename T, int Axis>
+class impl<1, std::complex<T>, T, Axis, false>
+  : private fft_base<1, precision<T>::single>,
+    public fft::backend<1, std::complex<T>, T, Axis, false>
+{
+public:
+  impl(Domain<1> const &dom, T scale)
+    : fft_base<1, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void by_reference(std::complex<T> *in, stride_type in_stride,
+			    T *out, stride_type out_stride,
+			    length_type size)
+  {
+    assert(in_stride == 1 && out_stride == 1);
+    assert(size == this->size_[0]);
+    rop(in, out);
+    if (!almost_equal(this->scale_, T(1.)))
+      scale(out, this->size_[0], this->scale_);
+  }
+  virtual void by_reference(std::pair<T *, T *> in, stride_type in_stride,
+			    T *out, stride_type out_stride,
+			    length_type size)
+  {
+  }
+};
+
+template <typename T, int Axis, bool Fwd>
+class impl<2, std::complex<T>, std::complex<T>, Axis, Fwd>
+  : private fft_base<2, precision<T>::single>,
+    public fft::backend<2, std::complex<T>, std::complex<T>, Axis, Fwd>
+{
+public:
+  impl(Domain<2> const &dom, T scale)
+    : fft_base<2, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void in_place(std::complex<T> *inout,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols)
+  {
+    cip2d(inout, Axis, Fwd);
+    if (!almost_equal(this->scale_, T(1.)))
+      if (Axis == 0)
+	for (length_type i = 0; i != cols; ++i)
+	  scale(inout + i * c_stride, this->size_[0], this->scale_);
+      else // Axis == 1
+	for (length_type i = 0; i != rows; ++i)
+	  scale(inout + i * r_stride, this->size_[1], this->scale_);
+  }
+
+  virtual void in_place(std::pair<T *, T *> inout,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols)
+  {
+  }
+
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    if (Axis == 0)
+    {
+      assert(in_r_stride == 1 && out_r_stride == 1);
+      cop2d(in, in_c_stride, out, out_c_stride, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != cols; ++i)
+	  scale(out + i * out_c_stride, this->size_[0], this->scale_);
+    }
+    else // Axis == 1
+    {
+      assert(in_c_stride == 1 && out_c_stride == 1);
+      cop2d(in, in_r_stride, out, out_r_stride, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != rows; ++i)
+	  scale(out + i * out_r_stride, this->size_[1], this->scale_);
+    }
+  }
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    zop2d(in, in_r_stride, in_c_stride, out, out_r_stride, out_c_stride,
+	  Fwd ? FFT_FORWARD : FFT_INVERSE);
+    if (!almost_equal(this->scale_, T(1.)))
+      for (length_type i = 0; i != rows; ++i)
+      {
+	scale(out.first + i * out_r_stride, this->size_[1], this->scale_);
+	scale(out.second + i * out_r_stride, this->size_[1], this->scale_);
+      }
+  }
+};
+
+template <typename T, int Axis>
+class impl<2, T, std::complex<T>, Axis, true>
+  : private fft_base<2, precision<T>::single>,
+    public fft::backend<2, T, std::complex<T>, Axis, true>
+{
+public:
+  impl(Domain<2> const &dom, T scale)
+    : fft_base<2, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void by_reference(T *in, stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    T s = this->scale_ * 0.5;
+    if (Axis == 0)
+    {
+      assert(in_r_stride == 1 && out_r_stride == 1);
+      rop2d(in, in_c_stride, out, out_c_stride, 0);
+      if (!almost_equal(s, T(1.)))
+	for (length_type i = 0; i != cols; ++i)
+	  scale(out + i * out_c_stride, this->size_[0] / 2 + 1, s);
+    }
+    else
+    {
+      assert(in_c_stride == 1 && out_c_stride == 1);
+      rop2d(in, in_r_stride, out, out_r_stride, 1);
+      if (!almost_equal(s, T(1.)))
+	for (length_type i = 0; i != rows; ++i)
+	  scale(out + i * out_r_stride, this->size_[1] / 2 + 1, s);
+    }
+  }
+  virtual void by_reference(T *in, stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+  }
+};
+
+template <typename T, int Axis>
+class impl<2, std::complex<T>, T, Axis, false>
+  : private fft_base<2, precision<T>::single>,
+    public fft::backend<2, std::complex<T>, T, Axis, false>
+{
+public:
+  impl(Domain<2> const &dom, T scale)
+    : fft_base<2, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    if (Axis == 0)
+    {
+      assert(in_r_stride == 1 && out_r_stride == 1);
+      rop2d(in, in_c_stride, out, out_c_stride, 0);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != cols; ++i)
+	  scale(out + i * out_c_stride, this->size_[0], this->scale_);
+    }
+    else
+    {
+      assert(in_c_stride == 1 && out_c_stride == 1);
+      rop2d(in, in_r_stride, out, out_r_stride, 0);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != rows; ++i)
+	  scale(out + i * out_r_stride, this->size_[1], this->scale_);
+    }
+  }
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+  }
+};
+
+template <typename inT, typename outT, bool Row, bool Fwd> class fftm;
+
+template <typename T, bool Row>
+class fftm<T, std::complex<T>, Row, true>
+  : private fft_base<2, precision<T>::single>,
+    public fft::fftm<T, std::complex<T>, Row, true>
+{
+public:
+  fftm(Domain<2> const &dom, T scale)
+    : fft_base<2, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void by_reference(T *in, stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    if (Row)
+    {
+      assert(in_c_stride == 1 && out_c_stride == 1);
+      ropm(in, in_r_stride, out, out_r_stride, 1);
+      // Scaling done in ropm()
+    }
+    else // !Row
+    {
+      assert(in_r_stride == 1 && out_r_stride == 1);
+      ropm(in, in_c_stride, out, out_c_stride, 0);
+      // Scaling done in ropm()
+    }
+  }
+  virtual void by_reference(T *in, stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    if (Row) assert(in_c_stride == 1);
+    else assert(in_r_stride == 1);
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+  }
+};
+
+template <typename T, bool Row>
+class fftm<std::complex<T>, T, Row, false>
+  : private fft_base<2, precision<T>::single>,
+    public fft::fftm<std::complex<T>, T, Row, false>
+{
+public:
+  fftm(Domain<2> const &dom, T scale)
+    : fft_base<2, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    if (Row) assert(in_c_stride == 1 && out_c_stride == 1);
+    else assert(in_r_stride == 1 && out_r_stride == 1);
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+  }
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    T *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+  }
+};
+
+template <typename T, bool Row, bool Fwd>
+class fftm<std::complex<T>, std::complex<T>, Row, Fwd>
+  : private fft_base<2, precision<T>::single>,
+    public fft::fftm<std::complex<T>, std::complex<T>, Row, Fwd>
+{
+public:
+  fftm(Domain<2> const &dom, T scale)
+    : fft_base<2, precision<T>::single>(dom, 0, scale) {}
+
+  virtual void in_place(std::complex<T> *inout,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    if (Row)
+    {
+      assert(c_stride == 1);
+      cipm(inout, r_stride, 0, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != rows; ++i)
+	  scale(inout + i * r_stride, cols, this->scale_);
+    }
+    else
+    {
+      assert(r_stride == 1);
+      cipm(inout, c_stride, 1, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != cols; ++i)
+	  scale(inout + i * c_stride, rows, this->scale_);
+    }
+  }
+
+  virtual void in_place(std::pair<T *, T *> inout,
+			stride_type r_stride, stride_type c_stride,
+			length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    if (Row)
+    {
+      assert(c_stride == 1);
+      zipm(inout, r_stride, c_stride, 0, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != rows; ++i)
+	{
+	  scale(inout.first + i * r_stride, cols, this->scale_);
+	  scale(inout.second + i * r_stride, cols, this->scale_);
+	}
+    }
+    else
+    {
+      assert(r_stride == 1);
+      zipm(inout, r_stride, c_stride, 1, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != cols; ++i)
+	{
+	  scale(inout.first + i * c_stride, rows, this->scale_);
+	  scale(inout.second + i * c_stride, rows, this->scale_);
+	}
+    }
+  }
+
+  virtual void by_reference(std::complex<T> *in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::complex<T> *out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+    assert(rows == this->size_[0] && cols == this->size_[1]);
+    if (Row)
+    {
+      assert(in_c_stride == 1 && out_c_stride == 1);
+      copm(in, in_r_stride, out, out_r_stride, 1, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != rows; ++i)
+	  scale(out + i * out_r_stride, cols, this->scale_);
+    }
+    else
+    {
+      assert(in_r_stride == 1 && out_r_stride == 1);
+      copm(in, in_c_stride, out, out_c_stride, 0, Fwd ? FFT_FORWARD : FFT_INVERSE);
+      if (!almost_equal(this->scale_, T(1.)))
+	for (length_type i = 0; i != cols; ++i)
+	  scale(out + i * out_c_stride, rows, this->scale_);
+    }
+  }
+  virtual void by_reference(std::pair<T *, T *> in,
+			    stride_type in_r_stride, stride_type in_c_stride,
+			    std::pair<T *, T *> out,
+			    stride_type out_r_stride, stride_type out_c_stride,
+			    length_type rows, length_type cols)
+  {
+  }
+};
+
+#define VSIPL_IMPL_PROVIDE(D, I, O, A, F)				     \
+template <>                                                                  \
+std::auto_ptr<fft::backend<D, I, O, A, F> >				     \
+create(Domain<D> const &dom, fft::backend<D, I, O, A, F>::scalar_type scale) \
+{                                                                            \
+  return std::auto_ptr<fft::backend<D, I, O, A, F> >			     \
+    (new impl<D, I, O, A, F>(dom, scale));				     \
+}
+
+VSIPL_IMPL_PROVIDE(1, float, std::complex<float>, 0, true)
+VSIPL_IMPL_PROVIDE(1, std::complex<float>, float, 0, false)
+VSIPL_IMPL_PROVIDE(1, std::complex<float>, std::complex<float>, 0, true)
+VSIPL_IMPL_PROVIDE(1, std::complex<float>, std::complex<float>, 0, false)
+VSIPL_IMPL_PROVIDE(1, double, std::complex<double>, 0, true)
+VSIPL_IMPL_PROVIDE(1, std::complex<double>, double, 0, false)
+VSIPL_IMPL_PROVIDE(1, std::complex<double>, std::complex<double>, 0, true)
+VSIPL_IMPL_PROVIDE(1, std::complex<double>, std::complex<double>, 0, false)
+
+VSIPL_IMPL_PROVIDE(2, float, std::complex<float>, 0, true)
+VSIPL_IMPL_PROVIDE(2, float, std::complex<float>, 1, true)
+VSIPL_IMPL_PROVIDE(2, std::complex<float>, float, 0, false)
+VSIPL_IMPL_PROVIDE(2, std::complex<float>, float, 1, false)
+VSIPL_IMPL_PROVIDE(2, std::complex<float>, std::complex<float>, 0, true)
+VSIPL_IMPL_PROVIDE(2, std::complex<float>, std::complex<float>, 1, true)
+VSIPL_IMPL_PROVIDE(2, std::complex<float>, std::complex<float>, 0, false)
+VSIPL_IMPL_PROVIDE(2, std::complex<float>, std::complex<float>, 1, false)
+VSIPL_IMPL_PROVIDE(2, double, std::complex<double>, 0, true)
+VSIPL_IMPL_PROVIDE(2, double, std::complex<double>, 1, true)
+VSIPL_IMPL_PROVIDE(2, std::complex<double>, double, 0, false)
+VSIPL_IMPL_PROVIDE(2, std::complex<double>, double, 1, false)
+VSIPL_IMPL_PROVIDE(2, std::complex<double>, std::complex<double>, 0, true)
+VSIPL_IMPL_PROVIDE(2, std::complex<double>, std::complex<double>, 1, true)
+VSIPL_IMPL_PROVIDE(2, std::complex<double>, std::complex<double>, 0, false)
+VSIPL_IMPL_PROVIDE(2, std::complex<double>, std::complex<double>, 1, false)
+
+#undef VSIPL_IMPL_PROVIDE
+
+#define VSIPL_IMPL_PROVIDE(I, O, R, D)				  \
+template <>                                                       \
+std::auto_ptr<fft::fftm<I, O, R, D> >				  \
+create(Domain<2> const &dom, impl::Scalar_of<I>::type scale)      \
+{                                                                 \
+  return std::auto_ptr<fft::fftm<I, O, R, D> >			  \
+    (new fftm<I, O, R, D>(dom, scale));	                          \
+}
+
+VSIPL_IMPL_PROVIDE(float, std::complex<float>, true, true)
+VSIPL_IMPL_PROVIDE(float, std::complex<float>, false, true)
+VSIPL_IMPL_PROVIDE(std::complex<float>, float, true, false)
+VSIPL_IMPL_PROVIDE(std::complex<float>, float, false, false)
+VSIPL_IMPL_PROVIDE(std::complex<float>, std::complex<float>, true, true)
+VSIPL_IMPL_PROVIDE(std::complex<float>, std::complex<float>, false, true)
+VSIPL_IMPL_PROVIDE(std::complex<float>, std::complex<float>, true, false)
+VSIPL_IMPL_PROVIDE(std::complex<float>, std::complex<float>, false, false)
+
+VSIPL_IMPL_PROVIDE(double, std::complex<double>, true, true)
+VSIPL_IMPL_PROVIDE(double, std::complex<double>, false, true)
+VSIPL_IMPL_PROVIDE(std::complex<double>, double, true, false)
+VSIPL_IMPL_PROVIDE(std::complex<double>, double, false, false)
+VSIPL_IMPL_PROVIDE(std::complex<double>, std::complex<double>, true, true)
+VSIPL_IMPL_PROVIDE(std::complex<double>, std::complex<double>, false, true)
+VSIPL_IMPL_PROVIDE(std::complex<double>, std::complex<double>, true, false)
+VSIPL_IMPL_PROVIDE(std::complex<double>, std::complex<double>, false, false)
+
+#undef VSIPL_IMPL_PROVIDE
+
+} // namespace vsip::impl::sal
+} // namespace vsip::impl
+} // namespace vsip
Index: src/vsip/impl/sal/fft.hpp
===================================================================
RCS file: /home/cvs/Repository/vpp/src/vsip/impl/sal/fft.hpp,v
retrieving revision 1.5
diff -u -r1.5 fft.hpp
--- src/vsip/impl/sal/fft.hpp	4 Mar 2006 23:03:30 -0000	1.5
+++ src/vsip/impl/sal/fft.hpp	7 Mar 2006 03:55:21 -0000
@@ -1,9 +1,10 @@
 /* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
 
-/** @file    vsip/impl/sal/sal.hpp
+/** @file    vsip/impl/sal/fft.hpp
     @author  Stefan Seefeld
     @date    2006-02-02
-    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with Mercury's SAL.
+    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with 
+             Mercury's SAL.
 */
 
 #ifndef VSIP_IMPL_SAL_FFT_HPP
@@ -15,8 +16,8 @@
 
 #include <vsip/support.hpp>
 #include <vsip/domain.hpp>
-#include <vsip/impl/signal-fft.hpp>
-#include <sal.h>
+#include <vsip/impl/fft/factory.hpp>
+#include <vsip/impl/fft/util.hpp>
 
 /***********************************************************************
   Declarations
@@ -29,636 +30,108 @@
 namespace sal
 {
 
-// TODO: figure out what exactly this ESAL flag is and tune it.
-//       (requires SAL, not CSAL, and real h/w)
-long const ESAL = 0;
+/// These are the entry points into the SAL FFT bridge.
+template <typename I, dimension_type D, typename S>
+std::auto_ptr<I>
+create(Domain<D> const &dom, S scale);
 
-inline unsigned int
-int_log2(unsigned int size)    // assume size = 2^n, != 0, return n.
-{
-  unsigned int n = 0;
-  while (size >>= 1) ++n;
-  return n;
-}
+} // namespace vsip::impl::sal
 
-template <dimension_type D> struct log2n;
 
-template <>
-struct log2n<1>
+namespace fft
 {
-  static unsigned long translate(Domain<1> const &d, int, unsigned long *out) 
-  {
-    *out = int_log2(d.size());
-    return *out;
-  }
-};
+struct Mercury_sal_tag;
 
-template <>
-struct log2n<2>
+template <typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<1, inT, outT, sD, rmT, Mercury_sal_tag>
 {
-  static unsigned long translate(Domain<2> const &d, int sd, unsigned long *out)
+  static bool const ct_valid = true;
+  static bool rt_valid(Domain<1> const &dom)
   {
-    // If sd == 1, invert size[0] and size[1], i.e. transpose.
-    *out = int_log2(d[sd].size());
-    *(out + 1) = int_log2(d[1 - sd].size());
-    return std::max(*out, *(out + 1));
+    // SAL can only deal with powers of 2.
+    if (dom.size() & (dom.size() - 1)) return false;
+    else return true;
   }
-};
-
-template <>
-struct log2n<3>
-{
-  static unsigned long translate(Domain<3> const &d, int sd, unsigned long *out)
+  static std::auto_ptr<backend<1, inT, outT,
+			       axis<inT, outT, sD>::value,
+			       direction<inT, outT, sD>::value> > 
+  create(Domain<1> const &dom, typename impl::Scalar_of<inT>::type scale)
   {
-    // If sd == 1, invert size[0] and size[1], i.e. transpose.
-    *out = int_log2(d[sd].size());
-    *(out + 1) = int_log2(d[1 - sd].size());
-    *(out + 2) = int_log2(d[2].size());
-    return std::max(*out, std::max(*(out + 1), *(out + 2)));
+    return sal::create<backend<1, inT, outT,
+                               axis<inT, outT, sD>::value,
+                               direction<inT, outT, sD>::value> >
+      (dom, scale);
   }
 };
 
-/// Helper trait used to discriminate setup / cleanup functions and options.
-template <typename inT, typename outT> struct fft_inout_trait;
-
-template <>
-struct fft_inout_trait<float, float>
-{
-  static bool const single = true;
-  static long const option = FFT_REAL_ONLY;
-};
-
-template <>
-struct fft_inout_trait<float, std::complex<float> >
-{
-  static bool const single = true;
-  static long const option = 0;
-};
-
-template <>
-struct fft_inout_trait<std::complex<float>, float>
-{
-  static bool const single = true;
-  static long const option = 0;
-};
-
-template <>
-struct fft_inout_trait<std::complex<float>, std::complex<float> >
-{
-  static bool const single = true;
-  static long const option = FFT_COMPLEX_ONLY;
-};
-
-template <>
-struct fft_inout_trait<double, double>
-{
-  static bool const single = false;
-  static long const option = FFT_REAL_ONLY;
-};
-
-template <>
-struct fft_inout_trait<double, std::complex<double> >
-{
-  static bool const single = false;
-  static long const option = 0;
-};
-
-template <>
-struct fft_inout_trait<std::complex<double>, double>
-{
-  static bool const single = false;
-  static long const option = 0;
-};
-
-template <>
-struct fft_inout_trait<std::complex<double>, std::complex<double> >
+template <dimension_type Dim,
+	  typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<Dim, inT, outT, sD, rmT, Mercury_sal_tag>
 {
-  static bool const single = false;
-  static long const option = FFT_COMPLEX_ONLY;
-};
-
-template <dimension_type D,
-	  typename inT, typename outT,
-	  bool single = fft_inout_trait<inT, outT>::single >
-struct fft_planner;
-
-template <dimension_type D,
-	  typename inT, typename outT>
-struct fft_planner<D, inT, outT, true /* single */>
-{
-  static void
-  create(void *&plan, unsigned long size) VSIP_THROW((std::bad_alloc))
-  {
-    long options = fft_inout_trait<inT, outT>::option;
-    FFT_setup setup = 0;
-    unsigned long nbytes = 0;
-    fft_setup(size, options, &setup, &nbytes);
-    plan = setup;
-  }
-  static void 
-  destroy(void *plan)
+  static bool const ct_valid = true;
+  static bool rt_valid(Domain<Dim> const &dom)
   {
-    FFT_setup setup = reinterpret_cast<FFT_setup>(plan);
-    fft_free(&setup);
-  }
-};
+    // SAL can only deal with powers of 2.
+    for (dimension_type d = 0; d != Dim; ++d)
+      if (dom[d].size() & (dom[d].size() - 1)) return false;
 
-template <dimension_type D,
-	  typename inT, typename outT>
-struct fft_planner<D, inT, outT, false /* single */>
-{
-  static void
-  create(void *&plan, unsigned long size) VSIP_THROW((std::bad_alloc))
-  {
-    long options = fft_inout_trait<inT, outT>::option;
-    FFT_setupd setup = 0;
-    unsigned long nbytes = 0;
-    fft_setupd(size, options, &setup, &nbytes);
-    plan = setup;
+    return true;
   }
-  static void 
-  destroy(void *plan)
+  static std::auto_ptr<backend<Dim, inT, outT,
+			       axis<inT, outT, sD>::value,
+			       direction<inT, outT, sD>::value> >
+  create(Domain<Dim> const &dom, typename impl::Scalar_of<inT>::type scale)
   {
-    FFT_setupd setup = reinterpret_cast<FFT_setupd>(plan);
-    fft_freed(&setup);
+    return sal::create<backend<Dim, inT, outT,
+                               axis<inT, outT, sD>::value,
+                               direction<inT, outT, sD>::value> >
+      (dom, scale);
   }
 };
 
-} // namespace vsip::impl::sal
-
-template<dimension_type D, typename inT, typename outT, bool doFftm>
-inline void
-create_plan(Fft_core<D, inT, outT, doFftm>& self, Domain<D> const& dom, 
-	    int sd, int expn, unsigned /* will_call */, inT*, outT*)
-  VSIP_THROW((std::bad_alloc))
-{
-  self.is_forward_ = (expn == -1);
-  self.buffer_ = alloc_align<typename Complex_of<inT>::type>(32, dom.size());
-  unsigned long max = sal::log2n<D>::translate(dom, sd, self.size_);
-  sal::fft_planner<D, inT, outT>::create(self.plan_, max);
-}
-
-template<dimension_type D, typename inT, typename outT, bool doFftm>
-inline void
-destroy(Fft_core<D, inT, outT, doFftm>& self) VSIP_THROW((std::bad_alloc))
-{
-  sal::fft_planner<D, inT, outT>::destroy(self.plan_);
-  free_align(self.buffer_);
-}
-
-inline void
-in_place(Fft_core<1, std::complex<float>, std::complex<float>, false>& self,
-	 std::complex<float> *inout) VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  COMPLEX *data = reinterpret_cast<COMPLEX *>(inout);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft_cipx(&setup, data, stride, self.size_[0], direction, sal::ESAL);
-}
-
-inline void
-in_place(Fft_core<1, std::complex<double>, std::complex<double>, false>& self,
-	 std::complex<double> *inout) VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  DOUBLE_COMPLEX *data = reinterpret_cast<DOUBLE_COMPLEX *>(inout);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft_cipdx(&setup, data, stride, self.size_[0], direction, sal::ESAL);
-}
-
-inline void
-in_place(Fft_core<2, std::complex<float>, std::complex<float>, false>& self,
-	 std::complex<float> *inout) VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  COMPLEX *data = reinterpret_cast<COMPLEX *>(inout);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft2d_cipx(&setup, data, stride, 2 << self.size_[1],
-	     self.size_[1], self.size_[0], direction, sal::ESAL);
-}
-
-inline void
-in_place(Fft_core<2, std::complex<double>, std::complex<double>, false>& self,
-	 std::complex<double> *inout) VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  DOUBLE_COMPLEX *data = reinterpret_cast<DOUBLE_COMPLEX *>(inout);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft2d_cipdx(&setup, data, stride, 2 << self.size_[1],
-	      self.size_[1], self.size_[0], direction, sal::ESAL);
-}
-
-template <vsip::dimension_type D, typename inT, typename outT, bool doFftm>
-inline void
-from_to(Fft_core<D, inT, outT, doFftm>&, inT const*, outT*) VSIP_NOTHROW
-{
-  VSIP_IMPL_THROW(vsip::impl::unimplemented("SAL FFT generic from_to()"));
-}
-
-// 1D real -> complex forward fft
-
-inline void
-from_to(Fft_core<1, float, std::complex<float>, false>& self,
-	float const* in, std::complex<float>* out_arg)
-  VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  float *out = reinterpret_cast<float*>(out_arg);
-  fft_roptx(&setup, const_cast<float*>(in), 1, out, 1,
-	    reinterpret_cast<float*>(self.buffer_),
-	    self.size_[0], FFT_FORWARD, sal::ESAL);
-  // unpack the data (see SAL reference for details).
-  int const N = (1 << self.size_[0]) + 2;
-  out[N - 2] = out[1];
-  out[1] = 0.f;
-  out[N - 1] = 0.f;
-  // forward fft_ropx is scaled up by 2.
-  float scale = 0.5f;
-  vsmulx(out, 1, &scale, out, 1, N, sal::ESAL);
-}
-
-// 1D complex -> real inverse fft
-
-inline void
-from_to(Fft_core<1, std::complex<float>, float, false>& self,
-	std::complex<float> const* in, float* out)
-  VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  float *in_ = 
-    reinterpret_cast<float *>(const_cast<std::complex<float>*>(in));
-  // pack the data (see SAL reference for details).
-  int const N = (1 << self.size_[0]) + 2;
-  in_[1] = in_[N - 2];
-  in_[N - 2] = in_[N - 1] = 0.f;
-
-  fft_ropx(&setup, in_, 1,
-	   out, 1, self.size_[0], FFT_INVERSE, sal::ESAL);
-}
-
-// 1D real -> complex forward fft
-
-inline void
-from_to(Fft_core<1, double, std::complex<double>, false>& self,
-	double const* in, std::complex<double>* out_arg)
-  VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  double *out = reinterpret_cast<double*>(out_arg);
-  fft_roptdx(&setup, const_cast<double*>(in), 1, out, 1,
-	     reinterpret_cast<double*>(self.buffer_),
-	     self.size_[0], FFT_FORWARD, sal::ESAL);
-  // unpack the data (see SAL reference for details).
-  int const N = (1 << self.size_[0]) + 2;
-  out[N - 2] = out[1];
-  out[1] = 0.f;
-  out[N - 1] = 0.f;
-  // forward fft_ropx is scaled up by 2.
-  double scale = 0.5f;
-  vsmuldx(out, 1, &scale, out, 1, N, sal::ESAL);
-}
-
-// 1D complex -> real inverse fft
-
-inline void
-from_to(Fft_core<1, std::complex<double>, double, false>& self,
-	std::complex<double> const* in, double* out)
-  VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  double *in_ = 
-    reinterpret_cast<double *>(const_cast<std::complex<double>*>(in));
-  // pack the data (see SAL reference for details).
-  int const N = (1 << self.size_[0]) + 2;
-  in_[1] = in_[N - 2];
-  in_[N - 2] = in_[N - 1] = 0.f;
-
-  fft_ropdx(&setup, in_, 1,
-	    out, 1, self.size_[0], FFT_INVERSE, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<1, std::complex<float>, std::complex<float>, false>& self,
-	std::complex<float> const *in, std::complex<float> *out_arg) VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  COMPLEX *in_ = reinterpret_cast<COMPLEX *>(const_cast<std::complex<float>*>(in));
-  COMPLEX *out = reinterpret_cast<COMPLEX *>(out_arg);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft_coptx(&setup, in_, stride, out, stride, 
-	    reinterpret_cast<COMPLEX*>(self.buffer_),
-	    self.size_[0], direction, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<1, std::complex<double>, std::complex<double>, false>& self,
-	std::complex<double> const *in, std::complex<double> *out_arg) VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  DOUBLE_COMPLEX *in_ = 
-    reinterpret_cast<DOUBLE_COMPLEX *>(const_cast<std::complex<double>*>(in));
-  DOUBLE_COMPLEX *out = reinterpret_cast<DOUBLE_COMPLEX *>(out_arg);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft_coptdx(&setup, in_, stride, out, stride,
-	     reinterpret_cast<DOUBLE_COMPLEX*>(self.buffer_),
-	     self.size_[0], direction, sal::ESAL);
-}
-
-// 2D real -> complex forward fft
-
-template <typename T>
-inline void 
-unpack(T *data, unsigned long N, unsigned long M, unsigned long stride)
-{
-  // unpack the data (see SAL reference, figure 3.6, for details).
-  unsigned long const M2 = M/2 + 1;
-  T t10r = data[N * stride];         // (1, 0).real()
-  T t10i = data[N * stride + 1];     // (1, 0).imag()
-  data[N * stride + 1] = 0.;         // (0, 0).imag()
-  data[N * stride - 2] = data[1];    // (0,-1).real()
-  data[N * stride - 1] = 0.;         // (0,-1).imag()
-  data[1] = 0.;
-  for (unsigned long r = 1; r != M2 - 1; ++r)
-  {
-    // set last row (r,-1)
-    data[(r + 1) * N * stride - 2] = data[2 * r * N * stride + 1];
-    data[(r + 1) * N * stride - 1] = data[(2 * r + 1) * N * stride + 1];
-    data[2 * r * N * stride + 1] = 0.;
-    data[(2 * r + 1) * N * stride + 1] = 0.;
-    // set first row (r, 0)
-    data[r * N * stride] = data[2 * r * N * stride];
-    data[r * N * stride + 1] = data[(2 * r + 1) * N * stride];
-    data[2 * r * N * stride] = 0.;
-    data[(2 * r + 1) * N * stride] = 0.;
-  }
-  data[(M2 - 1) * N * stride] = t10r;
-  data[(M2 - 1) * N * stride + 1] = 0.;
-  data[M2  * N * stride - 2] = t10i;
-  data[M2  * N * stride - 1] = 0;
-
-  // Now fill in the missing cells by symmetry.
-  for (unsigned long r = M2; r != M; ++r)
-  {
-    // first column (r, 0)
-    data[r * N * stride] = data[(M - r) * N * stride];
-    data[r * N * stride + 1] = - data[(M - r) * N * stride + 1];
-    // last column (r, -1)
-    data[(r + 1) * N * stride - 2] = data[(M - r + 1) * N * stride - 2];
-    data[(r + 1) * N * stride - 1] = - data[(M - r + 1) * N * stride - 1];
-  }
-}
-
-inline void
-from_to(Fft_core<2, float, std::complex<float>, false>& self,
-	float const* in, std::complex<float>* out_arg)
-  VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  float *out = reinterpret_cast<float*>(out_arg);
-  // The size of the output array is (N/2) x M (if measured in std::complex<float>)
-  unsigned long const N = (1 << self.size_[1]) + 2;
-  unsigned long const M = (1 << self.size_[0]);
-  fft2d_roptx(&setup, const_cast<float*>(in), self.stride_, self.dist_,
-	      out, self.stride_, N,
-	      reinterpret_cast<float*>(self.buffer_),
-	      self.size_[1], self.size_[0], FFT_FORWARD, sal::ESAL);
-
-  // unpack the data (see SAL reference, figure 3.6, for details).
-  unpack(out, N, M, self.stride_);
-  // forward fft_ropx is scaled up by 2.
-  float scale = 0.5f;
-  for (unsigned long i = 0; i != M; ++i, out += N)
-    vsmulx(out, self.stride_, &scale, out, self.stride_, N, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<2, double, std::complex<double>, false>& self,
-	double const* in, std::complex<double>* out_arg)
-  VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  double *out = reinterpret_cast<double*>(out_arg);
-  // The size of the output array is (N/2) x M (if measured in std::complex<float>)
-  unsigned long const N = (1 << self.size_[1]) + 2;
-  unsigned long const M = (1 << self.size_[0]);
-  fft2d_roptdx(&setup, const_cast<double*>(in), self.stride_, self.dist_,
-	       out, self.stride_, N,
-	       reinterpret_cast<double*>(self.buffer_),
-	       self.size_[1], self.size_[0], FFT_FORWARD, sal::ESAL);
-
-  // unpack the data (see SAL reference, figure 3.6, for details).
-  unpack(out, N, M, self.stride_);
-
-  // forward fft_ropx is scaled up by 2.
-  double scale = 0.5f;
-  for (unsigned long i = 0; i != M; ++i, out += N)
-    vsmuldx(out, self.stride_, &scale, out, self.stride_, N, sal::ESAL);
-}
-
-// 2D complex -> real inverse fft
-
-template <typename T>
-inline void 
-pack(T *data, unsigned long N, unsigned long M, unsigned long stride)
+template <typename inT,
+	  typename outT,
+	  int sD,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<3, inT, outT, sD, rmT, Mercury_sal_tag>
 {
-  unsigned long const M2 = M/2 + 1;
+  // No FFT 3D yet.
+  static bool const ct_valid = false;
+};
 
-  T t10i = data[M2  * N * stride - 2];
-  T t10r = data[(M2 - 1) * N * stride];
+} // namespace vsip::impl::fft
 
-  for (unsigned long r = M2 - 2; r; --r)
-  {
-    // pack first row (r, 0)
-    data[2 * r * N * stride] = data[r * N * stride];
-    data[(2 * r + 1) * N * stride] = data[r * N * stride + 1];
-    // pack last row (r,-1)
-    data[2 * r * N * stride + 1] = data[(r + 1) * N * stride - 2];
-    data[(2 * r + 1) * N * stride + 1] = data[(r + 1) * N * stride - 1];
-  }
-
-  data[N * stride] = t10r;         // (1, 0).real()
-  data[N * stride + 1] = t10i;     // (1, 0).imag()
-  data[1] = data[N * stride - 2];  // (0,-1).real()
-}
-
-inline void
-from_to(Fft_core<2, std::complex<float>, float, false>& self,
-	std::complex<float> const* in, float* out)
-  VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  float *in_ = 
-    reinterpret_cast<float *>(const_cast<std::complex<float>*>(in));
-  // The size of the output array is (N/2) x M (if measured in std::complex<float>)
-  // pack the data (see SAL reference for details).
-  int const N = (1 << self.size_[1]) + 2;
-  unsigned long const M = (1 << self.size_[0]);
-  pack(in_, N, M, self.stride_);
-  fft2d_ropx(&setup, in_, 1, N,
-	     out, 1, 1 << self.size_[1],
-	     self.size_[1], self.size_[0], FFT_INVERSE, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<2, std::complex<double>, double, false>& self,
-	std::complex<double> const* in, double* out)
-  VSIP_NOTHROW
+namespace fftm
 {
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  double *in_ = 
-    reinterpret_cast<double *>(const_cast<std::complex<double>*>(in));
-  // The size of the output array is (N/2) x M (if measured in std::complex<float>)
-  // pack the data (see SAL reference for details).
-  int const N = (1 << self.size_[1]) + 2;
-  unsigned long const M = (1 << self.size_[0]);
-  pack(in_, N, M, self.stride_);
-  fft2d_ropdx(&setup, in_, 1, N,
-	      out, 1, 1 << self.size_[1],
-	      self.size_[1], self.size_[0], FFT_INVERSE, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<2, std::complex<float>, std::complex<float>, false>& self,
-	std::complex<float> const *in, std::complex<float> *out_arg) VSIP_NOTHROW
+template <typename inT,
+	  typename outT,
+	  int A,
+	  int D,
+	  vsip::return_mechanism_type rmT>
+struct evaluator<inT, outT, A, D, rmT, fft::Mercury_sal_tag>
 {
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  COMPLEX *in_ = reinterpret_cast<COMPLEX *>(const_cast<std::complex<float>*>(in));
-  COMPLEX *out = reinterpret_cast<COMPLEX *>(out_arg);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft2d_coptx(&setup, in_, stride, 2 << self.size_[1],
-	      out, stride, 2 << self.size_[1],
-	      reinterpret_cast<COMPLEX*>(self.buffer_),
-	      self.size_[1], self.size_[0],
-	      direction, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<2, std::complex<double>, std::complex<double>, false>& self,
-	std::complex<double> const *in, std::complex<double> *out_arg) VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  DOUBLE_COMPLEX *in_ = 
-    reinterpret_cast<DOUBLE_COMPLEX *>(const_cast<std::complex<double>*>(in));
-  DOUBLE_COMPLEX *out = reinterpret_cast<DOUBLE_COMPLEX *>(out_arg);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fft2d_coptdx(&setup, in_, stride, stride << self.size_[1],
-	       out, stride, stride << self.size_[1],
-	       reinterpret_cast<DOUBLE_COMPLEX*>(self.buffer_),
-	       self.size_[1], self.size_[0],
-	       direction, sal::ESAL);
-}
-
-// FFTM 
-
-inline void
-in_place(Fft_core<2, std::complex<float>, std::complex<float>, true>& self,
-	 std::complex<float> *inout) VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fftm_cipx(&setup, reinterpret_cast<COMPLEX *>(inout),
-	    2, 2 << self.size_[1], self.size_[1], self.runs_,
-	    direction, sal::ESAL);
-}
-
-inline void
-in_place(Fft_core<2, std::complex<double>, std::complex<double>, true>& self,
-	 std::complex<double> *inout) VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fftm_cipdx(&setup, reinterpret_cast<DOUBLE_COMPLEX *>(inout),
-	    2, 2 << self.size_[1], self.size_[1], self.runs_,
-	    direction, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<2, std::complex<float>, std::complex<float>, true>& self,
-	std::complex<float> const *in, std::complex<float> *out_arg) VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  COMPLEX *in_ = 
-    reinterpret_cast<COMPLEX *>(const_cast<std::complex<float>*>(in));
-  COMPLEX *out = reinterpret_cast<COMPLEX *>(out_arg);
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fftm_coptx(&setup, in_, 2, 2 << self.size_[1],
-	     out, 2, 2 << self.size_[1],
-	     reinterpret_cast<COMPLEX*>(self.buffer_),
-	     self.size_[1], self.runs_,
-	     direction, sal::ESAL);
-}
-
-inline void
-from_to(Fft_core<2, float, std::complex<float>, true>& self,
-	float const *in, std::complex<float> *out_arg) VSIP_NOTHROW
-{
-  FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
-  float *out = reinterpret_cast<float*>(out_arg);
-  fftm_roptx(&setup, const_cast<float *>(in), self.stride_, self.dist_,
-	     out, self.stride_, self.dist_ + 2,
-	     reinterpret_cast<float*>(self.buffer_),
-	     self.size_[1], self.runs_,
-	     FFT_FORWARD, sal::ESAL);
-  // Unpack the data (see SAL reference for details), and scale back by 1/2.
-  int const N = (1 << self.size_[1]) + 2;
-  float scale = 0.5f;
-  for (int i = 0; i != self.runs_; ++i, out += self.dist_ + 2)
+  static bool const ct_valid = true;
+  static bool rt_valid(Domain<2> const &dom)
   {
-    out[(N - 2) * self.stride_] = out[self.stride_];
-    out[self.stride_] = 0.f;
-    out[(N - 1) * self.stride_] = 0.f;
-    vsmulx(out, self.stride_, &scale, out, self.stride_, N, sal::ESAL);
+    static dimension_type const axis = 1 - A;
+    // SAL can only deal with powers of 2.
+    if (dom[axis].size() & (dom[axis].size() - 1)) return false;
+    else return true;
   }
-}
-
-inline void
-from_to(Fft_core<2, double, std::complex<double>, true>& self,
-	double const *in, std::complex<double> *out_arg) VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  double *out = reinterpret_cast<double*>(out_arg);
-  fftm_roptdx(&setup, const_cast<double *>(in), self.stride_, self.dist_,
-	     out, self.stride_, self.dist_ + 2,
-	     reinterpret_cast<double*>(self.buffer_),
-	     self.size_[1], self.runs_,
-	     FFT_FORWARD, sal::ESAL);
-  // Unpack the data (see SAL reference for details), and scale back by 1/2.
-  int const N = (1 << self.size_[1]) + 2;
-  double scale = 0.5f;
-  for (int i = 0; i != self.runs_; ++i, out += self.dist_ + 2)
+  static std::auto_ptr<fft::fftm<inT, outT, A == 0, D == -2> > 
+  create(Domain<2> const &dom, typename impl::Scalar_of<inT>::type scale)
   {
-    out[(N - 2) * self.stride_] = out[self.stride_];
-    out[self.stride_] = 0.f;
-    out[(N - 1) * self.stride_] = 0.f;
-    vsmuldx(out, self.stride_, &scale, out, self.stride_, N, sal::ESAL);
+    return sal::create<fft::fftm<inT, outT, A == 0, D == -2> >(dom, scale);
   }
-}
-
-inline void
-from_to(Fft_core<2, std::complex<double>, std::complex<double>, true>& self,
-	std::complex<double> const *in, std::complex<double> *out_arg) VSIP_NOTHROW
-{
-  FFT_setupd setup = reinterpret_cast<FFT_setupd>(self.plan_);
-  DOUBLE_COMPLEX *in_ = 
-    reinterpret_cast<DOUBLE_COMPLEX *>(const_cast<std::complex<double>*>(in));
-  DOUBLE_COMPLEX *out = reinterpret_cast<DOUBLE_COMPLEX *>(out_arg);
-  long stride = 2;
-  long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
-  fftm_coptdx(&setup, in_, stride, stride << self.size_[1],
-	      out, stride, stride << self.size_[1],
-	      reinterpret_cast<DOUBLE_COMPLEX*>(self.buffer_),
-	      self.size_[1], self.runs_,
-	      direction, sal::ESAL);
-}
+};
 
+} // namespace vsip::impl::fftm
 } // namespace vsip::impl
 } // namespace vsip
 
Index: tests/fftm.cpp
===================================================================
RCS file: /home/cvs/Repository/vpp/tests/fftm.cpp,v
retrieving revision 1.11
diff -u -r1.11 fftm.cpp
--- tests/fftm.cpp	4 Mar 2006 23:03:30 -0000	1.11
+++ tests/fftm.cpp	7 Mar 2006 03:55:21 -0000
@@ -256,7 +256,6 @@
     cout << "\n";
   }
 #endif
-
   f_fftm(in, out);
   i_fftm(out, inv);