How to filter acoustic signal

I have a huge data file which I can’t ListPlot.

This code generates similar kind of data:

datatest =RandomSample[Join[RandomReal[{0.5, 15}, 20], RandomReal[.1, 10000]]];
datatest2 = 5 + Riffle[datatest, -datatest];

I want to filter (delete) the part of the data that is not necessary as follows:

peaks = FindPeaks[datatest2, 0, 0, 5.2];
ListPlot[datatest2, PlotRange -> All, Joined -> True,
Epilog -> {Red, PointSize[0.01], Point[peaks]}]

Currently I am using some kind of long way to do the task.

Is there any signal processing functionality in MMA that can do this easily?

Thank you

=================

2

 

I don’t quite understand. Isn’t the output of FindPeaks exactly what you’re looking for?
– yohbs
Feb 1 ’15 at 22:32

  

 

I want to keep the peaks and some data around. In another word I want to delete the marked parts in the plot.
– Algohi
Feb 1 ’15 at 22:34

  

 

I have more than 1 Billion data point which I can’t visualize. most of the data is similar to the parts marked in the plot. I want to delete these parts so that I can visualize the data
– Algohi
Feb 1 ’15 at 22:36

  

 

I think FindPeaks[] is the best way to do this. If it is taking too long, split datatest2 into kk equal portions (e.g., k=8k=8 or 1616) and then Parallelize[] your FindPeaks[] operation.
– David G. Stork
Feb 1 ’15 at 22:42

1

 

The question is a bit unclear (squiggly marks on a graph is not a substitute for a precise specification) but this previous question seems related: mathematica.stackexchange.com/q/22528/484
– Rahul
Feb 2 ’15 at 1:50

=================

3 Answers
3

=================

I am not an expert (understatemnt of the year!) in signal processing but you can use the band function and create a sparse matrix that has as many 1s as you want around the positions of the peaks. I am not sure the following is the best way to do this but it works:

With[{width = 200},
spArray = SparseArray[
Thread[
(Band[# – width, # + width] & /@ peaks[[All, 1]]) -> 1],
Length[datatest2]]
];

so this means that I have made a sparse array with 1s in 400 samples (±200\pm 200) around each of your peaks and if I multiply this with your original array dataset2 these are gonna be the only elements that will survive:

ListPlot[spArray datatest2, PlotRange -> All, Joined -> True,
Epilog -> {Red, PointSize[0.01], Point[peaks]}]

  

 

This is a good approach however the number of points still the same which is the main problem. you can image how impossible to plot data contains 1 Billion point. I will think of way to remove those zeros without affecting the zeros around the peaks. thanks
– Algohi
Feb 2 ’15 at 21:31

  

 

This will do the job: pos = Flatten[Position[Normal[spArray], 1]]; datatest2[[pos]];
– Algohi
Feb 2 ’15 at 21:46

  

 

@Algohi I don’t quite understand what the connection is between the filter and the plotting bit. If you need to plot, why not use the method I posted and subsample the resulting array. Is it billions of data with only a few peaks or with millions of peaks? I am sure we (SE) think of something if you re edit to clarify the question a bit
– gpap
Feb 2 ’15 at 22:18

  

 

It is billion with hundreds of peaks. when I said filter I did not know what the correct word is. that is why I add delete after. I hope this clarify the issue and sorry for any ambiguity:)
– Algohi
Feb 2 ’15 at 22:23

  

 

@Algohi, I didn’t mean it in a frustrated way 🙂 – and by no means don’t feel pressured to accept if it’s not working – but there’s a few things I don’t understand still. Does FindPeaks work at all with your data? I would doubt that it does with a billion points. So do you know where the peaks are? Do you know how many there are if not? If you don’t that’s a (difficult) question in itself: “how to find hundreds of peaks among billions of points?”.
– gpap
Feb 3 ’15 at 10:21

I’m not sure I understand what you need, but here’s my try: If you want to keep only 2d+1 data points around each peak, you can use

toKeep = Map[# + Range[-d, d] &, peaks[[All, 1]]];
choppedData = Map[Part[datatest2, #] &, toKeep];

choppedData is a list of lists, the i-th list contains the 2d+1 values around the i-th peak.

To speed up you can use ParallelMap instead of Map.

  

 

Thanks but note here that, peaks[[;;,1]] does not necessary to be Integer number. you may add Floor or Ceiling. Second, there will be intersection if d is larger than the minimum distance between peaks. This will create additional work to do. Any way thanks for the answer.
– Algohi
Feb 1 ’15 at 22:56

A common way to remove outliers is with the Median filter. What you want to do is the opposite: to keep the outliers and remove the inliers. Subtracting the data from the median, then clipping the result and selecting all those larger than a threshold is one way to proceed.

short = Select[Chop[datatest2 – MedianFilter[datatest2, 5], 0.5], Abs[#] > 0.1 &];
ListPlot[short, PlotRange -> All]

One downside is that this removes the mean value (about five in your data). You may wish to add this back in, and also to fiddle with the parameters to get the width you are looking for.