Reply by glen herrmannsfeldt●March 25, 20052005-03-25
Matt Roos wrote:
(snip)
> I'm looking for suggestions/code on doing the following. They generally get
> more complex as the list goes on, numbers (1) and (4) are really the
> startings point and most important to me.
> 1) Given a sequence of bytes, find all occurances of that sequence in a
> binary file.
Deterministic Finite State Automata are popular here.
> 2) Given a sequence of bytes with possible wildcards (fixed wildcard length
> of one byte), find all occurances of that sequence in a binary file.
> 3) Given a sequence of bytes with possible wildcards (wildcards representing
> a variable number of bytes), find all occurances of that sequence in a
> binary file.
The ones I know are dynamic programming algorithms. It might be
that FSA can do it, but dynamic programming allows different
weights to be added to different characters, and gap lengths.
> 4-6) Same as the above except for bits rather than bytes--bits are not
> necessarily aligned with byte boundaries in any way.
This doesn't really change the algorithm, but it does change the
was the data is read. Read bytes, separate the bits, supply
them to the algorithm being used. It also tends to be slow.
For the no gap (wildcard) or fixed length gap you can consider
the byte pattern and match that. A fixed bit pattern will
require eight possible byte patterns, not including the partial
bytes at the end. With fixed gap lengths you can also find an
equivalent byte pattern.
Find the literature on dynamic programming. These problems have
been well studied, mostly in the case of DNA and protein
sequence comparison for biology.
-- glen
Reply by Matt Roos●March 25, 20052005-03-25
I've posted this on the MATLAB newsgroup, but that group is pretty noisy and
I know there are a lot of MATLAB users here, so here goes...
I'm looking for suggestions/code on doing the following. They generally get
more complex as the list goes on, numbers (1) and (4) are really the
startings point and most important to me.
1) Given a sequence of bytes, find all occurances of that sequence in a
binary file.
2) Given a sequence of bytes with possible wildcards (fixed wildcard length
of one byte), find all occurances of that sequence in a binary file.
3) Given a sequence of bytes with possible wildcards (wildcards representing
a variable number of bytes), find all occurances of that sequence in a
binary file.
4-6) Same as the above except for bits rather than bytes--bits are not
necessarily aligned with byte boundaries in any way.
It's not really necessary for this to be in MATLAB (and would probably be
much faster in C), but the files won't be huge (generally no larger than a
few hundred KB) and search seqeuences relatively short (3-5 bytes typically)
so MATLAB processing time may be acceptable.
Are there some existing tools/code I could use? General suggestions?
Thanks,
Matt
--
Remove Xs from address to reply via e-mail.
--
Remove Xs from address to reply via e-mail.