Actions
|
|
[ Date Prev][ Date Next][ Thread Prev][ Thread Next][ Date Index][ Thread Index]
Re: [vsipl++] [patch] CML bindings for matrix transpose operations
- To: Don McCoy <don@xxxxxxxxxxxxxxxx>
- Subject: Re: [vsipl++] [patch] CML bindings for matrix transpose operations
- From: Stefan Seefeld <stefan@xxxxxxxxxxxxxxxx>
- Date: Wed, 04 Jun 2008 20:39:46 -0400
Don,
I have some small comments and questions (one for Jules). Overall this
looks good.
Don McCoy wrote:
This is patterned after the existing serial evaluator for transpose
operations using SIMD instructions, except that it dispatches operations
to CML. One important difference is that it handles split complex as
well as interleaved. Matrix copies are also performed (when the block
layouts match), but only if the strides are unit in the smallest
dimension, so as to have the potential to use the SPU's.
+// These macros support scalar and interleaved complex types
I don't find these macros very useful. Most of them you use exactly
once. And even the ones you use twice would result in simpler code if
spelled out. For example:
+
+#define VSIP_IMPL_CML_TRANS(T, FCN, CML_FCN) \
+ inline void \
+ FCN( \
+ T* a, ptrdiff_t rsa, ptrdiff_t csa, \
+ T* z, ptrdiff_t rsz, ptrdiff_t csz, \
+ size_t m, size_t n) \
+ { \
+ typedef Scalar_of<T>::type CML_T; \
+ CML_FCN( \
+ reinterpret_cast<CML_T*>(a), rsa, csa, \
+ reinterpret_cast<CML_T*>(z), rsz, csz, \
+ m, n ); \
+ }
+
+VSIP_IMPL_CML_TRANS(float, transpose, cml_mtrans_f)
+VSIP_IMPL_CML_TRANS(std::complex<float>, transpose, cml_cmtrans_f)
+#undef VSIP_IMPL_CML_TRANS
+
actually boils down to
inline void
transpose(float *a, ptrdiff_t rsa, ptrdiff_t csa,
float *z, ptrdiff_t rsz, ptrdiff_t csz,
size_t m, size_t n)
{
cml_mtrans_f(a, rsa, csa, z, rsz, csz, m, n);
}
for the first case, which I find much more readable than the macro
code above.
+ static bool rt_valid(DstBlock& dst, SrcBlock const& src)
+ {
+ bool rt = true;
+
+ // If performing a copy, both source and destination blocks
+ // must be unit stride.
+ if (Type_equal<src_order_type, dst_order_type>::value)
+ {
+ Ext_data<DstBlock> dst_ext(dst, SYNC_OUT);
+ Ext_data<SrcBlock> src_ext(src, SYNC_IN);
These objects only exist to check the strides, right ? I'm aware that we
don't have any SYNC enumerators to indicate 'no-copy', but shouldn't we
? Using SYNC_OUT and SYNC_IN looks a bit misleading to me, in this
context. Jules ?
+
+ dimension_type const s_dim1 = src_order_type::impl_dim1;
+ dimension_type const d_dim1 = src_order_type::impl_dim1;
Why two constants, if they hold the same value ?
+ if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1)
+ rt = false;
+ }
+
+ return rt;
+ }
+
+ static void exec(DstBlock& dst, SrcBlock const& src, row2_type, row2_type)
+ {
+ vsip::impl::Ext_data<DstBlock> dst_ext(dst, vsip::impl::SYNC_OUT);
+ vsip::impl::Ext_data<SrcBlock> src_ext(src, vsip::impl::SYNC_IN);
+
Why the full qualification here (but not above) ? (I know, this is
really picky, but I like compact and concise code. ;-) )
Thanks,
Stefan
--
Stefan Seefeld
CodeSourcery
stefan@xxxxxxxxxxxxxxxx
(650) 331-3385 x718
|