| View previous topic :: View next topic |
| Author |
Message |
Albretch Mueller Guest
|
Posted: Thu Jul 03, 2008 11:17 pm Post subject: adaptable FSM (finite state machine) for NLP |
|
|
There are many languages that are very similar. In many cases, to a large
extent if not fully, they share the same alphabet, syntax rules and even
phonemes
So, since parsers are essentially fed text sequentially (and naturally so)
I wonder what are the strategies developed out there for pluggable parsing
strategies (depth or breath first), some lexicon (which does not have to be
totally complete) and rules describing the generative possibilities of this
lexicon
As you could tell I am not a linguist myself, but after reading James
Allen's Natural Lang Understanding, in which he, even if the theory is
general, exclusively uses plenty of examples of English, I think such
an "English Grammar file" may not be that difficult to device and if you do
it for English I could easily imagine that there are such files for other
NL which definitely are less fractured/more homogeneous
Where can you find actual well-formed, declarative description of some NL
grammar including language features, constrains and everything(
possible ) in XML format or Backus-Naur form or such theoretical
studies?
thanks
lbrtchx |
|
| |
|
Back to top |
Ted Dunning Guest
|
Posted: Fri Jul 04, 2008 7:30 am Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
James Allen is a good guy and his book is nearly the last word on the
subject. The trouble is that this last word was 14 years ago (and
that is the second edition). Since then the field has been largely
revolutionized by statistical approaches.
The problem with what you seek is that there really aren't simple
declarative grammars of English that do you very much good. There are
grammars that have reasonable coverage, but they are really
complicated and large and more often than not, statistically driven.
There are also simple non-statistical grammars around, but they tend
to produce strange parses in many cases and no parse at all in a very
large fraction of the cases. In most cases, grammars that are not
statistically informed are brittle and produce bizarre results when
fed language outside their original domain.
This is exactly the difficulty that has led to statistical NLP in
general.
On Jul 3, 11:17 am, Albretch Mueller <lbrt...@gmail.com> wrote:
| Quote: | There are many languages that are very similar. In many cases, to a large
extent if not fully, they share the same alphabet, syntax rules and even
phonemes
So, since parsers are essentially fed text sequentially (and naturally so)
I wonder what are the strategies developed out there for pluggable parsing
strategies (depth or breath first), some lexicon (which does not have to be
totally complete) and rules describing the generative possibilities of this
lexicon
As you could tell I am not a linguist myself, but after reading James
Allen's Natural Lang Understanding, in which he, even if the theory is
general, exclusively uses plenty of examples of English, I think such
an "English Grammar file" may not be that difficult to device and if you do
it for English I could easily imagine that there are such files for other
NL which definitely are less fractured/more homogeneous
Where can you find actual well-formed, declarative description of some NL
grammar including language features, constrains and everything(
possible ) in XML format or Backus-Naur form or such theoretical
studies?
thanks
lbrtchx |
|
|
| |
|
Back to top |
TGV Guest
|
Posted: Fri Jul 04, 2008 7:41 am Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
On Jul 3, 8:17 pm, Albretch Mueller <lbrt...@gmail.com> wrote:
| Quote: | I think such an "English Grammar file" may not be that difficult to device
|
That should be "to devise", but let me assure you: it is difficult.
Very difficult. There are so many problems at so many levels, it's
almost impossible to start describing them, but an important one is
ambiguity: if you come up with a set of rules that cover (nearly)
every allowed English expression and assign a meaningful structure,
then you're going to have huge amounts of unwanted analyses. Getting
rid of them is more difficult than building them.
Also, human language understanding deviates from the rules if it needs
to, e.g., in case of obvious errors but with a clear context/meaning
available. Your spelling error above is a case in point: I could
understand you, even though you made a grammatical error there (device
is a legal word, but a noun). I can also understand Jabberwocky to a
certain degree, and I don't have problems reading Shakespearean lines
such as “How much more praise deserved thy beauty’s use” although the
subject is not in the proper position.
| Quote: | and if you do it for English I could easily imagine that there are such files for other
NL which definitely are less fractured/more homogeneous
|
Which languages would that be? English is one of the easiest languages
around. From your name I infer you're a speaker of German. Germanic
languages have a word order that is much more liberal than that in
English, which makes analyzing it even harder.
| Quote: | Where can you find actual well-formed, declarative description of some NL
grammar including language features, constrains and everything(
possible ) in XML format or Backus-Naur form or such theoretical
studies?
|
You might try Link Grammar (http://en.wikipedia.org/wiki/
Link_grammar). It's the only open source project I can think of that
has decent coverage and grammars for different languages. |
|
| |
|
Back to top |
Ian Parker Guest
|
Posted: Fri Jul 04, 2008 11:26 am Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
On Jul 4, 8:41 am, TGV <theovo...@yahoo.com> wrote:
| Quote: |
Which languages would that be? English is one of the easiest languages
around. From your name I infer you're a speaker of German. Germanic
languages have a word order that is much more liberal than that in
English, which makes analyzing it even harder.
Not quite true. English does not have inflexion and agreement. Current |
parsers are designed to use word order not case. Things like CLAWS
work out the case from context.
If you have a text in Latin, Greek or Arabic you have a case and
gender presented to you in endings. You are simply working backwards
and working out where something ending in "Beth" would fit in English.
Is there a parser which can do this, or would something have to be
done from scratch.
BTW - I have found Google Arabic to be most unsatisfactory it does not
seem to take cognicense of endings.
- Ian Parker |
|
| |
|
Back to top |
Ian Parker Guest
|
Posted: Fri Jul 04, 2008 8:26 pm Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
On 4 Jul, 20:17, Albretch Mueller <lbrt...@gmail.com> wrote:
| Quote: | James Allen . . .
~
I have also read Manning and Schuetze's "Foundations of Statistical Natural
Language Processing" and Jelinek's Statistical Methods for Speech
Recognition as well
~> there really aren't simple declarative grammars of English that do you
very much good.
. . ., but they are really complicated and large and more often than not,
statistically driven.
~
I see
~> That should be "to devise"
You might try Link Grammar (http://en.wikipedia.org/wiki/Link_grammar)
~
thank you
~> . . . ambiguity
~
"Ambiguity" is not really an issue for me. I just need to know if/when some
word has certain POS category
~
and if you do it for English I could easily imagine that there are such
files for other
NL which definitely are less fractured/more homogeneous
Which languages would that be? English is one of the easiest languages
around.
~
Well, you may get a second opinion from people dowing TTS and STT
software
~
Granted! English is a very resourceful lang in which in many cases you can
reuse the same words as a noun, verb, adverb and adj. . . .
~
But then what you -see- as "easy" you also implicitly regard as "ambiguos"
~
There is a reason why they have spelling bee contests in English, while in
other languages they don't make any sense whatsoever
~
I don't think that considering them as a whole there are no
languages "better" or "easier" than any other ones
~
lbrtchx
|
CLAWS parses English to 97% accuracy, or so I am told. If I have a
CLAWS output I can easily translate into an ending based language.
Ending baesed languages have a tendency to put words forst for
emphasis. German does not really do this. There is in fact quite a
strict word order. CLAWS parses sbordinate clauses which go at the end
in German. If I were tio say what the loudspeakers say at 5 am
"La illu illahu ...."
The La tells us there is NO God but Allah (Emphasis)
Latin is the same. My point is that I get my CLAWS output from
English. If I translate Arabic the words can be marked up in CLAWS as
they are translated. I can permute the words so that they give trhe
same labels as Arabic from English.
It is clear I can do this by trial and error, but there should be a
better way. The reference does not really tell me whether there is a
program which can do it.
- Ian Parker |
|
| |
|
Back to top |
Ian Parker Guest
|
Posted: Fri Jul 04, 2008 9:26 pm Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
I have now looked at your reference again. Thank you it was extremely
useful. I have a program which can convert Arabic into the required
format.
For translation we still require a match with English via CLAWS.
- Ian Parker |
|
| |
|
Back to top |
Albretch Mueller Guest
|
Posted: Sat Jul 05, 2008 12:17 am Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
| Quote: | James Allen . . .
~ |
I have also read Manning and Schuetze's "Foundations of Statistical Natural
Language Processing" and Jelinek's Statistical Methods for Speech
Recognition as well
~
| Quote: | there really aren't simple declarative grammars of English that do you
very much good.
. . ., but they are really complicated and large and more often than not,
statistically driven. |
~
I see
~
| Quote: | That should be "to devise"
You might try Link Grammar (http://en.wikipedia.org/wiki/Link_grammar)
~ |
thank you
~
"Ambiguity" is not really an issue for me. I just need to know if/when some
word has certain POS category
~
| Quote: | and if you do it for English I could easily imagine that there are such
files for other
NL which definitely are less fractured/more homogeneous
Which languages would that be? English is one of the easiest languages
around.
~ |
Well, you may get a second opinion from people dowing TTS and STT
software
~
Granted! English is a very resourceful lang in which in many cases you can
reuse the same words as a noun, verb, adverb and adj. . . .
~
But then what you -see- as "easy" you also implicitly regard as "ambiguos"
~
There is a reason why they have spelling bee contests in English, while in
other languages they don't make any sense whatsoever
~
I don't think that considering them as a whole there are no
languages "better" or "easier" than any other ones
~
lbrtchx |
|
| |
|
Back to top |
Albretch Mueller Guest
|
Posted: Sat Jul 05, 2008 12:18 am Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
I think I will be more than happy of I get just word lists with the POS
category for each word
~
lbrtchx |
|
| |
|
Back to top |
TGV Guest
|
Posted: Sat Jul 05, 2008 5:46 am Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
| Quote: | Not quite true. English does not have inflexion and agreement. Current
parsers are designed to use word order not case. Things like CLAWS
work out the case from context.
|
Parsers I've worked on all took agreement/inflection into account.
Then again, they were designed for robustness, so they had to.
| Quote: | If you have a text in Latin, Greek or Arabic you have a case and
gender presented to you in endings. You are simply working backwards
and working out where something ending in "Beth" would fit in English.
|
If it only were that simple. Parsing Latin is tricky business, mostly
because of embedding and ambiguity. Suppose you've got a phrase with
two masculine words ending in -us, which one's the subject?
| Quote: | Is there a parser which can do this, or would something have to be done from scratch.
|
There have been people working on parsing Latin, of course. I never
looked into it seriously, but I know from my personal environment that
Kees Koster and his daughter Tineke took a shot at one. Here's a URL
to his publications (look for 2005): http://www.cs.ru.nl/~kees/home/biblio.html |
|
| |
|
Back to top |
Ian Parker Guest
|
Posted: Sun Jul 06, 2008 9:06 am Post subject: Re: adaptable FSM (finite state machine) for NLP |
|
|
On 5 Jul, 06:46, TGV <theovo...@yahoo.com> wrote:
| Quote: | Not quite true. English does not have inflexion and agreement. Current
parsers are designed to use word order not case. Things like CLAWS
work out the case from context.
Parsers I've worked on all took agreement/inflection into account.
Then again, they were designed for robustness, so they had to.
If you have a text in Latin, Greek or Arabic you have a case and
gender presented to you in endings. You are simply working backwards
and working out where something ending in "Beth" would fit in English.
If it only were that simple. Parsing Latin is tricky business, mostly
because of embedding and ambiguity. Suppose you've got a phrase with
two masculine words ending in -us, which one's the subject?
You take the English order. Two "us" would seem to indicate a |
subordinate clause. You can't have two subjects unless you a compund
noun which has somehow got separated. There is a difference between
prose - Caesar and Tacitus and poetrry viz Virgil and Ovid. Virgil/
Ovid are much more inclined to do odd things than is Caesar. Virgil
and Ovid maintain a rythmic structure called "Iambic Pentameters".
| Quote: | Is there a parser which can do this, or would something have to be done from scratch.
There have been people working on parsing Latin, of course. I never
looked into it seriously, but I know from my personal environment that
Kees Koster and his daughter Tineke took a shot at one. Here's a URL
to his publications (look for 2005):http://www.cs.ru.nl/~kees/home/biblio.html
|
Take Latin and you have Arabic, at least from the parsing stand point.
Does the Koran contain sentences with 2 nouns endng in "u"? I suspect
it probably does as people sway (Iambic Pentameters) when they recite
it.
I say this to argue that a Latin parser would not just be a parser for
a dead language (not conpletely dead as the Vatican still uses it) but
could be adapted to all inflected lanuages.
Than you for the reference.
- Ian Parker |
|
| |
|
Back to top |
|