DSPRelated.com
Forums

Aggression Detection

Started by annina July 30, 2008
Hello,

I posted this earlier in Matlab DSP but somebody said I should post it
here.

At the request of a friend I started a project to detect aggression in
workplace  meeting situations using voice processing
http://code.google.com/p/aggressiondetector/

The situation that the projects seeks to address was described as
follows:

    "Everyone starts out calm, but then an idea will be debated. Some
debaters are under the impression that the loudest, most aggressive
argument will win. So they start to yell. Gradually, the noise increases
from discussion to yelling, because others feel as if they need to yell to
be heard over the din of the original yeller."  More of this is described
in the spec sheet: 
http://code.google.com/p/aggressiondetector/wiki/SpecSheet

Research suggests that the three best cues for verbal aggression are
fundamental frequency (, the ratio of signal energy below and above 1000 Hz
(RE), and the standard deviation of the energy of the three highest peaks
in the spectrum.

I have started to write some matlab scripts that look at  frequency ranges
over time and now using those to look at recordings of  angry vs. non-angry
conversations. 

I was wondering if anyone is doing something similar and/or would be
willing to share code that would help in the endeavor of aggression
detection in work meeting environments as described above. 

Annina



annina wrote:
> Hello, > > At the request of a friend I started a project to detect aggression in > workplace meeting situations using voice processing
> "Everyone starts out calm, but then an idea will be debated. Some > debaters are under the impression that the loudest, most aggressive > argument will win. So they start to yell. Gradually, the noise increases > from discussion to yelling, because others feel as if they need to yell to > be heard over the din of the original yeller."
> Research suggests that the three best cues for verbal aggression are > fundamental frequency (, the ratio of signal energy below and above 1000 Hz > (RE), and the standard deviation of the energy of the three highest peaks > in the spectrum. > > I have started to write some matlab scripts that look at frequency ranges > over time and now using those to look at recordings of angry vs. non-angry > conversations.
1. The whole idea seems flawed because of the oversimplification. How about approaching the simpler task of the automatic spam detection? Or parsing the newsgroups traffic and sorting out the idiots? 2. Even if it could detect aggression, what would you do then? Call 911, engage the water sprinklers? 3. The really dangerous people are those who don't talk much.
> I was wondering if anyone is doing something similar and/or would be > willing to share code that would help in the endeavor of aggression > detection in work meeting environments as described above.
Detecting aggression requires the special speech recognition engine with AI. Even if you have all of that, the success rate is very questionable. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On 30 Jul, 20:52, "annina" <anni...@gmail.com> wrote:
> Hello, > > I posted this earlier in Matlab DSP but somebody said I should post it > here. > > At the request of a friend I started a project to detect aggression in > workplace &#4294967295;meeting situations using voice processinghttp://code.google.com/p/aggressiondetector/ > > The situation that the projects seeks to address was described as > follows: > > &#4294967295; &#4294967295; "Everyone starts out calm, but then an idea will be debated. Some > debaters are under the impression that the loudest, most aggressive > argument will win. So they start to yell. Gradually, the noise increases > from discussion to yelling, because others feel as if they need to yell to > be heard over the din of the original yeller." &#4294967295;More of this is described > in the spec sheet:http://code.google.com/p/aggressiondetector/wiki/SpecSheet > > Research suggests that the three best cues for verbal aggression are > fundamental frequency (, the ratio of signal energy below and above 1000 Hz > (RE), and the standard deviation of the energy of the three highest peaks > in the spectrum. > > I have started to write some matlab scripts that look at &#4294967295;frequency ranges > over time and now using those to look at recordings of &#4294967295;angry vs. non-angry > conversations. > > I was wondering if anyone is doing something similar and/or would be > willing to share code that would help in the endeavor of aggression > detection in work meeting environments as described above.
Forget about detecting aggression. The best you can do is to detect aggression *indicators*, which is a completely different thing. Consider cats and dogs: An aggressive dog will murr while an aggressive cat will flick its tail. The two aggression *indicators* of murring and tail-flicking are only valid in the contexts of dogs and cats, respectively. A *dog* which flicks its tail or a cat who purrs is in a benevolent or playful mood. So if your aggression *indicator* is applied in the wrong context, the detection result might be 180 degrees wrong. To find a similar human difference of context, the Norwegian and Danish languages are very similarl they share the same root (old Norse) and substantial parts of the vocabulary. However, for some reason danes are very often percieved - wrongly, according to my Danish friends! - by Norwegians as sarcastic or condescending. The perception has to do with both the choice of words or phrasings and the 'voiced' manner the statements are delivered. So a 'sarcasm detector' developed based on the Norwegian language will flag >70% of the time if applied to Danish speech or text. As for ferquency contents in the voice, ther may be plenty of other reasons for the voice to change. Some people just don't cope well with stpeaking in front of an audience, or may be nervous for a number of reasons other than aggression. Trying to detect aggression as such is a very, very demanding goal. Rune
"Rune Allnor" <allnor@tele.ntnu.no> wrote in message 
news:841f0361-f26d-4dff-affa-0a1f6962f44b@z66g2000hsc.googlegroups.com...
As for ferquency contents in the voice, ther may be plenty of
other reasons for the voice to change. Some people just don't
cope well with stpeaking in front of an audience, or may be
nervous for a number of reasons other than aggression.

=======
An angry mob is distinguishable from a stadium full of ecstatic soccer fans. 
It might also be possible to analyse one person's mutterings. A small 
isolated group, though, does indeed offer challenges.

>>>>>>>>
Trying to detect aggression as such is a very, very demanding goal. ======== OTOH, one might presume that some level of aggression in some form or another is always present. ;)
On 2 Aug, 05:49, "MikeWhy" <boat042-nos...@yahoo.com> wrote:
> "Rune Allnor" <all...@tele.ntnu.no> wrote in message > > news:841f0361-f26d-4dff-affa-0a1f6962f44b@z66g2000hsc.googlegroups.com... > As for ferquency contents in the voice, ther may be plenty of > other reasons for the voice to change. Some people just don't > cope well with stpeaking in front of an audience, or may be > nervous for a number of reasons other than aggression. > > ======= > An angry mob is distinguishable from a stadium full of ecstatic soccer fans. > It might also be possible to analyse one person's mutterings. A small > isolated group, though, does indeed offer challenges.
It might be possible for a human being to distinguis between the two based on qualitative judgements of the behaviour of the two groups. *Quantifying* the diffrence, which is required if the OP is to achieve the stated goal, is a completely different thing. Hence my comments about the context being important.
> Trying to detect aggression as such is a very, very demanding goal. > > ======== > OTOH, one might presume that some level of aggression in some form or > another is always present. ;)
Sure. Some of the strongest expressions of aggression are the quiet, intense statements, accompanied with a certain body language. How do you detect them by means of spectral content or voice loudness? Rune
"Rune Allnor" <allnor@tele.ntnu.no> wrote in message 
news:5fb15644-6c8c-4106-b70c-ac8afabfdbcb@z66g2000hsc.googlegroups.com...
> On 2 Aug, 05:49, "MikeWhy" <boat042-nos...@yahoo.com> wrote: >> "Rune Allnor" <all...@tele.ntnu.no> wrote in message >> >> news:841f0361-f26d-4dff-affa-0a1f6962f44b@z66g2000hsc.googlegroups.com... >> As for ferquency contents in the voice, ther may be plenty of >> other reasons for the voice to change. Some people just don't >> cope well with stpeaking in front of an audience, or may be >> nervous for a number of reasons other than aggression. >> >> ======= >> An angry mob is distinguishable from a stadium full of ecstatic soccer >> fans. >> It might also be possible to analyse one person's mutterings. A small >> isolated group, though, does indeed offer challenges. > > It might be possible for a human being to distinguis between the two > based on qualitative judgements of the behaviour of the two groups. > *Quantifying* the diffrence, which is required if the OP is to achieve > the stated goal, is a completely different thing. Hence my comments > about the context being important.
The hubbub of a large crowd, I expect, can be easily measured and analyzed with some success. An individual voice might also be reasonably analyzed. I don't know for certain that it can, or that you can identify reliable indicators. A small group, the OP context, has different challenges. I have doubts that tonal or amplitude metrics can be reliable. However, peaceable discussions generally have an identifiable rhythm and flow. Generally, one person speaks at a time. Continual interruptions in mid-sentence might be one indicator, and likely detectable and measurable. As to what to do with that information... You could fine the offender one credit for each such violation of the Verbal Morality Statute.
> >> Trying to detect aggression as such is a very, very demanding goal. >> >> ======== >> OTOH, one might presume that some level of aggression in some form or >> another is always present. ;) > > Sure. Some of the strongest expressions of aggression are the quiet, > intense statements, accompanied with a certain body language. How > do you detect them by means of spectral content or voice loudness?
I had in mind my SIL. The detector should always be triggered in her presence, regardless of the spoken tone or the words that come out.
On 2 Aug, 16:32, "MikeWhy" <boat042-nos...@yahoo.com> wrote:
> "Rune Allnor" <all...@tele.ntnu.no> wrote in message > > news:5fb15644-6c8c-4106-b70c-ac8afabfdbcb@z66g2000hsc.googlegroups.com... > > > > > > > On 2 Aug, 05:49, "MikeWhy" <boat042-nos...@yahoo.com> wrote: > >> "Rune Allnor" <all...@tele.ntnu.no> wrote in message > > >>news:841f0361-f26d-4dff-affa-0a1f6962f44b@z66g2000hsc.googlegroups.com... > >> As for ferquency contents in the voice, ther may be plenty of > >> other reasons for the voice to change. Some people just don't > >> cope well with stpeaking in front of an audience, or may be > >> nervous for a number of reasons other than aggression. > > >> ======= > >> An angry mob is distinguishable from a stadium full of ecstatic soccer > >> fans. > >> It might also be possible to analyse one person's mutterings. A small > >> isolated group, though, does indeed offer challenges. > > > It might be possible for a human being to distinguis between the two > > based on qualitative judgements of the behaviour of the two groups. > > *Quantifying* the diffrence, which is required if the OP is to achieve > > the stated goal, is a completely different thing. Hence my comments > > about the context being important. > > The hubbub of a large crowd, I expect, can be easily measured and analyzed > with some success.
Sure. But the question is to map the correct 'emotion' to that hubbub: The OP wants a lynch mob to trig an alarm while a soccer spectator crowd to pass.
> An individual voice might also be reasonably analyzed. I don't know for > certain that it can, or that you can identify reliable indicators. A small > group, the OP context, has different challenges. I have doubts that tonal or > amplitude metrics can be reliable. However, peaceable discussions generally > have an identifiable rhythm and flow. Generally, one person speaks at a > time. Continual interruptions in mid-sentence might be one indicator, and > likely detectable and measurable.
But again, the problem as stated is to specifically detect 'aggression' as opposed to nondescript excitement. Rune
"Rune Allnor" <allnor@tele.ntnu.no> wrote in message 
news:be89b511-cb35-4547-9c3c-be65e2d3af24@34g2000hsf.googlegroups.com...
> On 2 Aug, 16:32, "MikeWhy" <boat042-nos...@yahoo.com> wrote: >> amplitude metrics can be reliable. However, peaceable discussions >> generally >> have an identifiable rhythm and flow. Generally, one person speaks at a >> time. Continual interruptions in mid-sentence might be one indicator, and >> likely detectable and measurable. > > But again, the problem as stated is to specifically detect > 'aggression' as > opposed to nondescript excitement.
There ya go. Sounds like you're well on your way.
Rune Allnor wrote:

   ...

> But again, the problem as stated is to specifically detect > 'aggression' as opposed to nondescript excitement.
I can usually tell pretty accurately with dogs, but I can't quantify how. Then again, dogs don't usually try to hide their feelings. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On 2 Aug, 18:40, Jerry Avins <j...@ieee.org> wrote:
> Rune Allnor wrote: > > &#4294967295; &#4294967295;... > > > But again, the problem as stated is to specifically detect > > 'aggression' as opposed to nondescript excitement. > > I can usually tell pretty accurately with dogs, but I can't quantify > how. Then again, dogs don't usually try to hide their feelings.
Use the same clues (murring, ears along the neck) with a cat and you might find yourself in an awkward position. Then get the cat angry and see how the purring all of a sudden becomes 'screaching' (sorry, don't know the correct English term) and the ears ar not along the neck but almost dug into the neck. *Almost* the same clues as when the cat is in good mood, only different... No wonder lots of people think cats are unpredictable. Rune