 |
|
|
|
Actions
|
|
[ Date Prev][ Date Next][ Thread Prev][ Thread Next][ Date Index][ Thread Index]
Re: [vsipl++] Fast convolution (slow!)
- To: Gaetano Mendola <gmendola@xxxxxxxxxxx>
- Subject: Re: [vsipl++] Fast convolution (slow!)
- From: Jules Bergmann <jules@xxxxxxxxxxxxxxxx>
- Date: Wed, 06 Aug 2008 14:52:45 -0400
Gaetano,
In my application I have to perform convolution of a sequence with N
points with a sequence of M points where N ~ 1E6 points, M ~ 1K
points. Having a startup time of 1 hour for my application is too
much, is there a way to serialize on disk an instance of
"vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd,
vsip::by_reference>"?
I agree, a startup time of 1 hour is excessive! It turns out VSIPL++
uses FFTW3's PATIENT planning mode when number of times == 0 (since
'0' indicates the FFT object will be used an infinite number of
times). As you're seeing, PATIENT can be very expensive.
If you set number of times to 15, VSIPL++ will use FFTW3's MEASURE
planning mode, which should be much faster than the PATIENT planning
mode, while still delivering reasonable performance.
For example, for a 1K and 1M FFTs (measuring MFLOP/s on a ~3.6 GHz
Xeon):
ESTIMATE MEASURE PATIENT
(times == 1) (times == 15) (times == 0)
1K 4959 6355 7482
32K 3985 5370 6488
1M 687 1735
While FFTW3 allows plans to be pre-computed and stored to disk, there
is no way to access the functionalty from VSIPL++ at present.
Also take a look at the following two tests:
Test1.
typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference> fwd_fft_type;
fwd_fft_type fwd_fft(1048576, 1.0); <== 25 minutes
Test2.
vsip::Vector<T> inputs(1048576);
typedef vsip::Fft<vsip::const_Vector, T, T, vsip::fft_fwd, vsip::by_reference, 1> fwd_fft_type;
fwd_fft_type fwd_fft(1048576, 1.0); <== 10 ms (as expected)
fwd_fft(inputs); <== 91 ms (I expected here around 25 mins)
I think something wrong is going on.
No, that's reasonable. It is the power of heuristics doing well
against a brute force search. PATIENT (number of times == 0) does a
thorough search of the space of possible FFTs, while ESTIMATE (number
of times == 1) uses simple heuristics.
Presumably, if you measured the FFT time for Test1, it would be faster
than 91 ms. If it wasn't, it would indicate that the ESTIMATE
heuristics happened to pick the best case heuristically, leaving
PATIENT with nothing better to find.
-- Jules
Regards
Gaetano Mendola
--
Jules Bergmann
CodeSourcery
jules@xxxxxxxxxxxxxxxx
(650) 331-3385 x705
|
|