DSPRelated.com
Forums

Reproducible research

Started by rajesh May 8, 2008
 Very recently i came across a paper(published in 2007) which was
about a new algorithm the authors have developed. They have published
their results using a database.

It happened that the aim of the algorithm was same as that of what i
had developed for
my masters thesis.(But the two algorithms differ in their approach
towards the same goal).

I had written to the authors of the paper requesting for the database
to test my algorithm.
But the authors say that the database is copyright protected   and the
copyright owner is
no longer trading the database.

I have tested my algorithm on a publicly available database.How can i
compare my results with published results when i cant get the database
that they have used ? How can one publish results on datbases which
are no longer available.?(when there are publicly available
databases).

 Is it good to make it mandatiory to publish results on publicly
available databases ?

regards
Rajesh.D

On May 8, 12:43 am, rajesh <getrajes...@gmail.com> wrote:
> Very recently i came across a paper(published in 2007) which was > about a new algorithm the authors have developed. They have published > their results using a database. > > It happened that the aim of the algorithm was same as that of what i > had developed for > my masters thesis.(But the two algorithms differ in their approach > towards the same goal). > > I had written to the authors of the paper requesting for the database > to test my algorithm. > But the authors say that the database is copyright protected and the > copyright owner is > no longer trading the database. > > I have tested my algorithm on a publicly available database.How can i > compare my results with published results when i cant get the database > that they have used ? How can one publish results on datbases which > are no longer available.?(when there are publicly available > databases). > > Is it good to make it mandatiory to publish results on publicly > available databases ? > > regards > Rajesh.D
You don't give many details so some of this is probably irrelevant. If their work was funded by public money, like the NSF in the USA, you could complain to that agency. If the paper was published in an academic journal, you could complain to the editors about their peer review process. I know a guy who used to peer review for one of the IEEE journals and if he couldn't reproduce the claimed result, he would nix the paper. If you're of the "honey is better than vinegar" school of thought, you could ask them (the authors) to run your algorithm against their data. I don't think there is a way to make anything universally mandatory, except for death.
rajesh <getrajeshin@gmail.com> wrote in news:0922b937-fab7-4a84-86e9-
e9cc0001c26a@q27g2000prf.googlegroups.com:

> > Very recently i came across a paper(published in 2007) which was > about a new algorithm the authors have developed. They have published > their results using a database. > > It happened that the aim of the algorithm was same as that of what i > had developed for > my masters thesis.(But the two algorithms differ in their approach > towards the same goal). > > I had written to the authors of the paper requesting for the database > to test my algorithm. > But the authors say that the database is copyright protected and the > copyright owner is > no longer trading the database. > > I have tested my algorithm on a publicly available database.How can i > compare my results with published results when i cant get the database > that they have used ? How can one publish results on datbases which > are no longer available.?(when there are publicly available > databases). > > Is it good to make it mandatiory to publish results on publicly > available databases ? > > regards > Rajesh.D > >
You could roll up your sleeves, and run their algorithm on your data. -- Scott Reverse name to reply
rajesh wrote:

> I have tested my algorithm on a publicly available database.How can i > compare my results with published results when i cant get the database > that they have used ? How can one publish results on datbases which > are no longer available.?(when there are publicly available > databases).
I do not understand where the problem is. You write down what you've done and, eventually, point out the problems, like you did here, of a comparison. It's part of each research to describe the methodology. bye, -- piergiorgio
"rajesh" <getrajeshin@gmail.com> wrote in message 
news:0922b937-fab7-4a84-86e9-e9cc0001c26a@q27g2000prf.googlegroups.com...
> > Very recently i came across a paper(published in 2007) which was > about a new algorithm the authors have developed. They have published > their results using a database. > > It happened that the aim of the algorithm was same as that of what i > had developed for > my masters thesis.(But the two algorithms differ in their approach > towards the same goal). > > I had written to the authors of the paper requesting for the database > to test my algorithm. > But the authors say that the database is copyright protected and the > copyright owner is > no longer trading the database. > > I have tested my algorithm on a publicly available database.How can i > compare my results with published results when i cant get the database > that they have used ? How can one publish results on datbases which > are no longer available.?(when there are publicly available > databases). > > Is it good to make it mandatiory to publish results on publicly > available databases ?
You can't - obviously enough. I take this as a rhetorical question. Just do it. No it isn't. I recommend that you take this whole thing to a higher level. Are you really interested in comparing algorithms or in checking to see if your implementation code is working? Maybe ask them for the code they used instead of the database that they don't have available. As someone else suggested, apply their algorithm to your own favorite database and go from there. If you think there's something important about the statistics or form of the database they used, ask them about that. Don't expect to be living in a perfect world - as defined by your own framework. People often publish as an activity that's not part of their mainstream work. So, they might only be able to afford to publish what they have and not repeat the effort for your convenience or to meet some ideal model of what a "research paper" is "supposed" to be. Publishing and good research are two different, if overlapping, things. Try repeating the results of an underwater acoustics experiment! Some might be repeatable - others not so easily or only statistically. One is at the mercy of the environment and physics. Fred
On May 8, 12:43&#4294967295;am, rajesh <getrajes...@gmail.com> wrote:
> &#4294967295;Very recently i came across a paper(published in 2007) which was > about a new algorithm the authors have developed. They have published > their results using a database. > > It happened that the aim of the algorithm was same as that of what i > had developed for > my masters thesis.(But the two algorithms differ in their approach > towards the same goal). > > I had written to the authors of the paper requesting for the database > to test my algorithm. > But the authors say that the database is copyright protected &#4294967295; and the > copyright owner is > no longer trading the database. > > I have tested my algorithm on a publicly available database.How can i > compare my results with published results when i cant get the database > that they have used ? How can one publish results on datbases which > are no longer available.?(when there are publicly available > databases). > > &#4294967295;Is it good to make it mandatiory to publish results on publicly > available databases ? > > regards > Rajesh.D
You are at their mercy. Last year I read an article that was reporting on testing digital voice recorders. However, they only identified the recorders by the numbers 1-10. What use the test results were to be to a reader without identifying the recorders tested is another matter. Based on some factors I could compute from some of their test data, I thought they had done their tests wrong, and the performance numbers were grossly misleading. I asked them to identify the recorders so I could re-test one or more of them. They refused. I couldn't even write a 'Letter to the Editor' about their testing methods without covering all conceivable recorders or recording methods that they could be. That would have been quite a letter, compared to my retesting as few as one recorder and showing them to be wrong. I read this as they do not want to be subject to scrutiny but want to get a publication. The editor said if I wrote the all encompassing letter, and it was reviewed and approved for publication, and it was in the interest of the publication, as defined by the editor, the authors would then provide me the model numbers. Of course then I would have to rewrite the letter to be brief and resubject it to review for publication. I don't think so. It is interesting that the paper had 2 authors, 2 contributors, and 5 additional reviewers BEFORE it was submitted for review for publication; they said so in their acknowledgement. Why do you suppose they did that? I think the paper shouldn't have been published without identifying the recorders tested, but I guess the reviewers and editor did not agree. What they are doing is extreme self protection. And it is mostly working... I am working on a presentation (already accepted) for a publication- related national conference where I will compare results from their methods to methods that make sense, and demonstrate why their methods do not. So they will not escape very public scrutiny, inspite of having tried so hard. I can sympathize with your situation, but people will not be fair if it is in their interest not to be. Dirk
On May 8, 9:43 am, rajesh <getrajes...@gmail.com> wrote:
> Very recently i came across a paper(published in 2007) which was > about a new algorithm the authors have developed. They have published > their results using a database. > > It happened that the aim of the algorithm was same as that of what i > had developed for > my masters thesis.(But the two algorithms differ in their approach > towards the same goal). > > I had written to the authors of the paper requesting for the database > to test my algorithm. > But the authors say that the database is copyright protected and the > copyright owner is > no longer trading the database. > > I have tested my algorithm on a publicly available database.How can i > compare my results with published results when i cant get the database > that they have used ? How can one publish results on datbases which > are no longer available.?(when there are publicly available > databases). > > Is it good to make it mandatiory to publish results on publicly > available databases ? >
May be i was exaggerating out of anguish. The authors are from a highly reputed college in London and journal is an IEEE transactions. One of the author even expressed interest in learning my algorithm and was hopeful that i will have an opportunity to write a paper in a journal and/or make my matlab code publicly available. My results aren't any superior to them , i get an identification rate close to 90% while they claim 95% .(although they are on two different databases.) So i didnt see any reason to think that they are protecting themselves. But when i showed them my results they have stopped replying to my e-mails. I had even requested for their code. The statistical nature of the database is very debatable. In database that i have used there are test vectors which give 75% and some which give 100% results. I know the difference between the two and i can separate them by making certain observations.
On May 9, 1:00 pm, rajesh <getrajes...@gmail.com> wrote:
> On May 8, 9:43 am, rajesh <getrajes...@gmail.com> wrote: > > > > > Very recently i came across a paper(published in 2007) which was > > about a new algorithm the authors have developed. They have published > > their results using a database. > > > It happened that the aim of the algorithm was same as that of what i > > had developed for > > my masters thesis.(But the two algorithms differ in their approach > > towards the same goal). > > > I had written to the authors of the paper requesting for the database > > to test my algorithm. > > But the authors say that the database is copyright protected and the > > copyright owner is > > no longer trading the database. > > > I have tested my algorithm on a publicly available database.How can i > > compare my results with published results when i cant get the database > > that they have used ? How can one publish results on datbases which > > are no longer available.?(when there are publicly available > > databases). > > > Is it good to make it mandatiory to publish results on publicly > > available databases ? > > May be i was exaggerating out of anguish. > > The authors are from a highly reputed college in London and journal is > an IEEE transactions. > > One of the author even expressed interest in learning my algorithm and > was hopeful that i will have an opportunity to write a paper in a > journal and/or make my matlab code publicly available. > > My results aren't any superior to them , i get an identification rate > close to 90% while they claim 95% .(although they are > on two different databases.) > So i didnt see any reason to think that they are protecting > themselves. But when i showed them my results they have stopped > replying to my e-mails. I had even requested for their code. > > The statistical nature of the database is very debatable. In database > that i have used there are test vectors which give 75% and some which > give 100% results. I know the difference between the two and i can > separate them by making certain observations.
BTW i come from industry background, i am new to all this.
On May 9, 2:18 pm, rajesh <getrajes...@gmail.com> wrote:
> On May 9, 1:00 pm, rajesh <getrajes...@gmail.com> wrote: > > > > > On May 8, 9:43 am, rajesh <getrajes...@gmail.com> wrote: > > > > Very recently i came across a paper(published in 2007) which was > > > about a new algorithm the authors have developed. They have published > > > their results using a database. > > > > It happened that the aim of the algorithm was same as that of what i > > > had developed for > > > my masters thesis.(But the two algorithms differ in their approach > > > towards the same goal). > > > > I had written to the authors of the paper requesting for the database > > > to test my algorithm. > > > But the authors say that the database is copyright protected and the > > > copyright owner is > > > no longer trading the database. > > > > I have tested my algorithm on a publicly available database.How can i > > > compare my results with published results when i cant get the database > > > that they have used ? How can one publish results on datbases which > > > are no longer available.?(when there are publicly available > > > databases). > > > > Is it good to make it mandatiory to publish results on publicly > > > available databases ? > > > May be i was exaggerating out of anguish. > > > The authors are from a highly reputed college in London and journal is > > an IEEE transactions. > > > One of the author even expressed interest in learning my algorithm and > > was hopeful that i will have an opportunity to write a paper in a > > journal and/or make my matlab code publicly available. > > > My results aren't any superior to them , i get an identification rate > > close to 90% while they claim 95% .(although they are > > on two different databases.) > > So i didnt see any reason to think that they are protecting > > themselves. But when i showed them my results they have stopped > > replying to my e-mails. I had even requested for their code. > > > The statistical nature of the database is very debatable. In database > > that i have used there are test vectors which give 75% and some which > > give 100% results. I know the difference between the two and i can > > separate them by making certain observations. > > BTW i come from industry background, i am new to all this.
I got the code from them ! So no qualms.
On Thu, 8 May 2008 13:03:18 -0700 (PDT), dbell <bellda2005@cox.net> posted:
|------------------------------------------------------------------------|
|"On May 8, 12:43am, rajesh <getrajes...@gmail.com> wrote:              |
|> Very recently i came across a paper(published in 2007) which was     |
|> about a new algorithm the authors have developed. They have published |
|> their results using a database.                                       |
|>                                                                       |
|> It happened that the aim of the algorithm was same as that of what i  |
|> had developed for                                                     |
|> my masters thesis.(But the two algorithms differ in their approach    |
|> towards the same goal).                                               |
|>                                                                       |
|> I had written to the authors of the paper requesting for the database |
|> to test my algorithm.                                                 |
|> But the authors say that the database is copyright protected  and the|
|> copyright owner is                                                    |
|> no longer trading the database.                                       |
|>                                                                       |
|> I have tested my algorithm on a publicly available database.How can i |
|> compare my results with published results when i cant get the database|
|> that they have used ? How can one publish results on datbases which   |
|> are no longer available.?(when there are publicly available           |
|> databases).                                                           |
|>                                                                       |
|> Is it good to make it mandatiory to publish results on publicly      |
|> available databases ?                                                 |
|>                                                                       |
|> regards                                                               |
|> Rajesh.D                                                              |
|                                                                        |
|You are at their mercy.  Last year I read an article that was           |
|reporting on testing digital voice recorders.  However, they only       |
|identified the recorders by the numbers 1-10.  What use the test        |
|results were to be to a reader without identifying the recorders        |
|tested is another matter.  Based on some factors I could compute from   |
|some of their test data, I thought they had done their tests wrong,     |
|and the performance numbers were grossly misleading.  I asked them to   |
|identify the recorders so I could re-test one or more of them.  They    |
|refused.  I couldn't even write a 'Letter to the Editor' about their    |
|testing methods without covering all conceivable recorders or           |
|recording methods that they could be.  That would have been quite a     |
|letter, compared to my retesting as few as one recorder and showing     |
|them to be wrong.  I read this as they do not want to be subject to     |
|scrutiny but want to get a publication.  The editor said if I wrote     |
|the all encompassing letter, and it was reviewed and approved for       |
|publication,  and it was in the interest of the publication, as         |
|defined by the editor, the authors would then provide me the model      |
|numbers.  Of course then I would have to rewrite the letter to be       |
|brief and resubject it to review for publication.  I don't think so.    |
|                                                                        |
|It is interesting that the paper had 2 authors, 2 contributors, and 5   |
|additional reviewers BEFORE it was submitted for review for             |
|publication; they said so in their acknowledgement. Why do you suppose  |
|they did that?                                                          |
|                                                                        |
|I think the paper shouldn't have been published without identifying     |
|the recorders tested, but I guess the reviewers and editor did not      |
|agree.                                                                  |
|                                                                        |
|What they are doing is extreme self protection. And it is mostly        |
|working...                                                              |
|                                                                        |
|[..]                                                                    |
|                                                                        |
|I can sympathize with your situation, but people will not be fair if    |
|it is in their interest not to be.                                      |
|                                                                        |
|Dirk"                                                                   |
|------------------------------------------------------------------------|

I agree that what is pretended to be science is too often not science.

Elsewhere in the thread, rajesh <getrajeshin@gmail.com> posted on
Fri, 9 May 2008 03:04:19 -0700 (PDT):

|------------------------------------------|
|"I got the code from them ! So no qualms."|
|------------------------------------------|

Good, but not good enough. Thirty years later, someone might wish to check
something in the paper which Rajesh mentioned or the paper which Dirk mentioned.
All the coauthors could be dead by then or the relevant things of interest
absent from the paper may have since been lost, so these short-term rules
which may provide access (which in Rajesh's case did, but that might be
of no help to someone thirty years later). This is not good enough.
One case which I do not believe to be fraudulent is a file supposedly on
the Internet mentioned in the scientific book "Formal Hardware Verification:
Methods and Systems in Comparison" edited by Kropf and published by Springer.
I tried to download the file many years after publication but it was no
longer available, supposedly and probably because it was no longer any
where near the state of the art. That is not a good enough reason. One can
not study the progression of scientific advancement by only looking at
the current state of the art.

Some other cases which displeased me without necessarily being cases of
fraud are the lack of availability of Verischemelog which was alleged to
exist in James Jennings; and Eric Beuscher,
"Verischemelog- Verilog embedded in Scheme", DSL, 1999 and the lack of
availability of the Sheffield University Plasmasphere Ionosphere Model
(SUPIM) which was alleged to exist in J. R. Souza; G. J. Bailey;
M. A. Abdu; and I. S. Batista, "Comparisons of Low Latitude F Region
Peak Densities, Heights and Equatorial ExB Drift from IRI with
Observational Data and the Sheffield University Plasmasphere Ionosphere
Model", "Adv. Space Res.", Vol. 31, No. 3, 2003. I went through the
supposedly proper channels to obtain these before 2008 and so far I
have not even received an acknowledgement of a request from any of
the co-authors. In commercial advertisements in the trade press one
does not need to make one's product available, but of course these
were supposed to be scientific publications so if something is published
then it by definition must be available, and there should not be an
application procedure.

|------------------------------------------------------------------------|
|"I am working on a presentation (already accepted) for a publication-   |
|related national conference where I will compare results from their     |
|methods to methods that make sense, and demonstrate why their methods   |
|do not. So they will not escape very public scrutiny, inspite of        |
|having tried so hard."                                                  |
|------------------------------------------------------------------------|

I am a victim of a real case of scientific fraud (not mentioned above),
such that I was recently forced to resign after I uncovered misconduct by a
supposed professor (the supposed tutor of my aborted attempt at a Ph.D.).
Fortunately, I do have enough data from what had already been published in
related work to expose some of this fraud, and I have been collecting
examples of fraud or allegations of fraud or lack of adequate refereeing
in diverse fields (e.g. biology and physics) to show that the perception
of the widespread existence of science may be mistaken.

Dirk, please let me know how I would be able to obtain a copy of
your aforementioned presentation. It may be another useful example to cite.

Rajesh posted on Fri, 9 May 2008 01:00:46 -0700 (PDT):

|----------------------------------------------------------------------|
|"[..]                                                                 |
|                                                                      |
|The authors are from a highly reputed college in London and journal is|
|an IEEE transactions.                                                 |
|                                                                      |
|[..]"                                                                 |
|----------------------------------------------------------------------|

So? Issac Newton's conviction against the wave nature of light was
not good and Albert Onestone retracted what he called the biggest
blunder of his life. Reputation should have no bearing on evaluating
whether a claim is scientifically valid.

One example of something which almost seems to be an
oversight of coauthors' and referees' in the literature 
is an implication that neutrons are significant for
causing single event effects on spacecraft (they are
well-known for causing single event effects on aircraft
and at sea-level instead) in Egas Henes Neto;
Ivandro Ribeiro; Michele Vieira; Gilson Wirth; and
Fernanda Lima Kastensmidt, "USING BULK BUILT-IN CURRENT
SENSORS TO DETECT SOFT ERRORS", "IEEE Micro",
September-October 2006.

An IEEE paper by Laranjeira which I no longer have a copy of which Henderson-Sellers
claimed on Page 118 of his book
Henderson-Sellers, "Object-oriented metrics : measures of complexityD"}, Prentice Hall,
1996, ISBN 0132398729
(which I also no longer have access to) is flawed mathematically.

You may also wish to read the thread
"Scientific puzzle of formal circuit verification at next week's DAC"
from the newsgroup comp.cad.synthesis from June 2002.

Yours sincerely,
Colin Paul Gloster