A Computational Framework for Simulating Cross-Linguistic Acquisition of Spatial Prepositions
University of Southern California, Los Angeles
pamela.fox at alumni.usc.edu
Quick links:
- Introduction
- Background Terminology & Research
- Spatial Preposition Learning Methods
- Implementation
- Results
- Conclusion
- Acknowledgments
- References
INTRODUCTION
The current trend in teaching computers about human languages is to combine massive amounts of data with brute-force statistical methods.
This project hopes to teach a computer language the same way that a child would learn it, by analyzing a 3-dimensional world and linguistic input.
Because spatial prepositions describe basic 3-dimensional concepts that must be acquired early in childhood, they are the current focus of this project.
BACKGROUND TERMINOLOGY & RESEARCH
We first review terminology and agreed categories of spatial prepositions, so we can decide later if children might use different strategies for different categories.
- Figure vs. Ground
In a sentence where a spatial prepositions connects two objects, the object to be located is termed the “figure” and the object to locate the first object on is termed the “ground,” following work of Leonard Talmy.
-
Frames of Reference
-
None (Topological)
When a spatial preposition used to describe a situation is invariant under alternative frames in reference, we can term that as “topological,” in the sense that the topological properties of the projective geometry are not affected under a transformation such as stretching, compressing, moving, bending, etc.
Examples of English topological prepositions are “in,” “inside,” and “on.”
-
Horizontal-Intrinsic
The intrinsic frame of reference locates figures according to some “intrinsic” features of the ground object. A feature of the ground object (e.g. side, facet) is treated as the new ground, and the figure is located within some angle and distance of that feature.
Features for non-animal objects are generally based on functionality of object, shape, or even motion. (Animal features are based on correspondence to human body)
Feature assignment may vary radically across languages.
Example of English statement made from intrinsic frame of reference: “The man is in front of the house” (when man is standing outside side of house with front door)
Note there is often an ambiguity between statements made from intrinsic frame of reference vs. relative frame of reference, discussed next.
-
Horizontal-Relative
The relative frame of reference treats both the original figure and ground as figures and located them within some angle and distance to an implicit ground object, the oriented speaker.
Statements with spatial prepositions that are made from the relative frame of reference can change in truth value with a change in position or orientation of the speaker.
Example of English statement made from relative frame of reference: “The man is to the left of the house” (from speaker’s perspective, see FIGURE)
-
Horizontal-Absolute
In the absolute frame of reference, objects are located according to coordinate systems imposed by fixed geographical landmarks.
This is the only frame that always exhibits logical transitivity (lake north of river, town north of lake -> town north of river)
It is a simple system conceptually, but requires cognitive ability on the speaker to know the location of the (usually non-visible) landmark objects at all times in relation to object.
Example of English statement made from absolute frame of reference: “The man is north of the house.”
-
Vertical Frame of Reference
The vertical coordinate system is halfway between a relative frame of reference and absolute frame of reference.
Because gravity requires us to almost always swivel our heads up to see the sky and down to see the ground, the vertical frame of reference is nearly an absolute system. Even if we turn our head sideways, our notion of up and down don’t generally change.
The vertical frame of reference has permeated even the topological concepts in spatial prepositions. Notions like support, subposition, and superposition, are now considered topological in our gravity-restrained world.
Example of English statement expressed from vertical frame of reference/topology: “The man is on the house.”
-
None (Topological)
SPATIAL PREPOSITION LEARNING METHODS
Now that we know the various categories of spatial prepositions, we review research on cross-linguistic description of spatial prepositions to determine possible techniques for learning the prepositions.
Since its difficult to actually study children, the research reviewed focuses on describing spatial prepositions across languages. If a description method is found that can describe prepositions in all languages, then its possible children employ the same description method cognitively.
While reviewing the methods suggested by various research, we realized that both the encoding of the spatial memory and the learning from these memories likely differ for the topological spatial prepositions than for the ones employing frames of reference. This is due to the inherent nature of the domain covered by the two types of prepositions. As described earlier, whereas the topological prepositions are not concerned with orientation, the others are primarily concerned with orientation.
Note that we treat the vertical frame of reference as topological as it imposes the same learning restraints, and for horizontal frames of reference, we focus on the relative frame of reference because 1) asking the computer to employ an intrinsic system would require it also learning how to perform feature-assignment, which is a problem outside the scope of this research, and 2) once the object features were known, learning the intrinsic system would be just a simple transformation of the relative speaker-based system to the ground-based system.
-
Learning Topological Prepositions
Early research proposed a one-to-one mapping between spatial concepts and prepositions (e.g. ON to English “on”), or a many-to-one mapping (e.g. ON, IN to Spanish “en”).
However Bowerman found cross-linguistic situations that could not possibly be described by these simple mappings, shown in figure at right.
In a cross-linguistic study on adpositions, Levinson found the same disparate distribution as Bowerman, but also discovered the clustering of prepositions around combinations of spatial primitives like superposition, contact, containment. So, what was previously thought of as the ON universal spatial concept is actually a composition of superposition and contact.
With reference to the situations presented in [4] in the figure above,the English word “on” can be described as +contact, the English word “in” can be described as +containment. The Finnish prefix –ssa can be described as a composite category with two foci, one at +containment, the other at +attachment, while the finnish prefix –lla can be described as +contact, -containment, -attachment.
So using the discoveries of Levinson, we will propose that children learn topological prepositions by analyzing the world in terms of spatial primitives.
-
Learning Non-Topological Prepositions
In [5], O’Keefe suggests that all prepositions can be described by vector angles and lengths, but we realized that it was too difficult for a child to learn topological prepositions from collecting only data on the vectors in a scene. There are too many vector combinations to be recorded, and without an abstraction of the vectors or expectation of some vector relationships to be important, the child needs significantly more input to learn a preposition. And even once learned, the child could not understand an implicational hierarchy between prepositions if his knowledge of them is just stored as a large array of vectors (which they do demonstrate understanding).
However, a solely vector-based description works well for prepositions describing the relative frame of reference. To learn “left of” or “right of” only the angle of the eye sweep from figure to ground needs to be recorded, and to learn “in front of” or “in back of” prepositions, only the magnitude and angle of the eye to object vectors need to be recorded. This information could be stored an absolute scale (e.g. 35 degrees, 10 feet greater), a relative scale (1/8 of eye viewing arc, 5x width of object more), or binary (negative angle, lesser distance). Considering humans’ bad reputation with reporting solid numerical facts from their memory, the information is probably stored in some hybrid of relative and binary format.
-
Deciding When to Learn What
Since we’ve proposed that different information is learned from for the two categories of spatial prepositions, the child needs to know when to store which information.
O’Keefe pointed out that non-topological prepositions can be modified by a comparative like “more.” (“The man is more to the left of the house than the woman”) The child could realize from hearing comparatives applied to some prepositions that these must be described directly by either a vector angle or magnitude.
-
(Discarded) Infant Learning Methods
Brute force method: Child stores every annotated scene in memory, compares every new scene to that memory until it finds match
- Problem: Huge storage overhead in the brain, long comparison time, not very smart!
Accumulate method: For each preposition, child adds new data values onto one node.
- Problem: Some prepositions, like the Finnish in Bowerman's figure above, cover a non-continuous spatial domain (they describe two distinct situations). This method would learn that “-ssa” meant +/-containment, +/-attachment, i.e. it would have learned nothing at all!
Mixed-cause clustering: This technique comes from the AI research community. The child could collect all the unique cases, and learn a clustering of those cases. The child would learn that a preposition clustered around one or multiple spatial primitives.
- Problem: This is a decent method, but it fails to capture the implicational hierarchy of spatial prepositions (i.e. “in” -> “inside”).
-
(Selected) Infant Learning Method
Decision Tree Learning: This technique comes from the AI research community. The child collects all the unique cases. At any point (even with minimal data) they can learn a decision tree from this data where values from attributes are queried to eventually reach a decision, as in example shown at right. The basic algorithm is to construct the tree from the top-down, creating nodes for the attributes in order of information gain, until the attributes have been exhausted or the decision tree perfectly fits the examples.- Problem: There can be no overlap between examples for this to work (i.e. no two prepositions can be described by exact same attribute values for either spatial primitives or eye-to-object vectors). So it cannot learn complete synonyms. It may be possible to modify however, in the future.
IMPLEMENTATION
With the cognitive techniques for learning chosen, we can now implement the knowledge acquisition and learning of spatial prepositions within a 3-d computational environment.
A 3-d scene, shown above, is setup with simple primitives, and a GUI allows the user to input sentences like “ball is above cone.” The program will find the known objects in inputted sentences and assign them as figure and ground accordingly.
A brain node in the scene keeps track of memory, creating a node for each new example of a sentence template (“figure_isabove_ground”), and recording the values of the attributes after analyzing the scene.
The calculations for the attributes are shown in the tablesbelow.
For prepositions describing relative frame of reference:| direction of eye sweep | The angle between the vector from the eye to the figure and the vector from the eye to the ground is calculated, the cross product of the angle is calculated to determine the direction of the angle. One direction is chosen as negative, and one as positive, arbitrarily. If the angle is 0 degrees, the result is “none.” |
| distance to eye | The magnitude of the vector from the eye to the figure and the magnitude of the vector from the eye to the ground are both calculated. The comparison between their magnitudes is reported as greater than, less than, or equal within a very small threshold. |
For topological prepositions:
| horizontal proximity | The vectors between the horizontal bounds of the figure and ground are all compared, and the magnitude of the shortest vector is calculated. If that magnitude is within some small threshold, horizontal proximity is recorded as true. |
| vertical proximity | The vectors between the vertical bounds of the figure and ground are all compared, and the magnitude of the shortest vector is calculated. If that magnitude is within some small threshold, vertical proximity is recorded as true. |
| full containment of | The positions of the corners of the bounding boxes of the figure and ground are calculated. If the bounding box of the ground is within the bounding box of the figure, this property is recorded as true. |
| full containment by | The positions of the corners of the bounding boxes of the figure and ground are calculated. If the bounding box of the figure is within the bounding box of the ground, this property is recorded as true. |
| partial containment of | The positions of the corners of the bounding boxes of the figure and ground are calculated. Every corner of the ground is compared to the bounding box of the figure. If at least one of the corners is within the bounding box, then this property is recorded as true. |
| partial containment by | The positions of the corners of the bounding boxes of the figure and ground are calculated. Every corner of the figure is compared to the bounding box of the ground. If at least one of the corners is within the bounding box, then this property is recorded as true. |
| contact | The corners of the bounding boxes of the figure and ground are all compared with eachother. If the distance between one of them is within a small threshold, contact is reported to be true. |
| attachment | All the objects in the scene are examined to see if there is a ternary object that exhibits both partial containment in the figure and partial containment in the ground. If it finds an object with this property, attachment is reported as true. |
| superposition | The vector from the figure to the ground plane is compared to the vector from the ground object to the ground plane. If the vector from the figure to ground is greater than the vector from the ground object to ground plane, this property is reported as true. |
| subposition | The vector from the figure to the ground plane is compared to the vector from the ground object to the ground plane. If the vector from the ground to ground object is greater than the vector from the figure to ground plane, this property is reported as true. |
RESULTS
For prepositions describing relative frame of reference:
Decision tree generated after describing scenes with “in back of,” “in front of,” “right of,” and “left of.”
The algorithm creates a clean decision tree that classifies situations correctly, but only when it learns from only the direction of eye sweep and distance of eye vector attributes. It needs much more data when the other attributes are included to make a clean tree.
For topological prepositions:
Decision trees, above, are generated after describing the 3 scenes from FIGURE 1 in three of the languages, shown below in the 3-d environment.
The memory nodes for each of the languages have to be inputted separately into the algorithm because of overlapping prepositions.
Though the decision trees are not always as specific as we would expect, since they are presented with only these three situations, they are accurate and distinct. Using this method, children could start learning the relation of conceptual methods to spatial prepositions in input immediately, with no need for massive input. The algorithm is also storage-efficient, as it only creates as large a tree as possible.
CONCLUSION
We have proposed memory recording and learning methods for learning two broad categories of spatial prepositions. The results of implementing these methods are promising, which suggests they could be employed by children as successfully as by computers.
The success of testing spatial preposition learning in a 3-d computational framework should encourage similar testing of a broader range of spatio-temporal theories.
There is still work remaining to refine exactly the list of spatial primitives, and to test the other horizontal frames of reference (intrinsic, absolute).
ACKNOWLEDGMENTS
I would like to thank Barry Schein, my advisor for this project, as well as David Kempe, Toby Mintz, and Jerry Hobbs for their advice.
REFERENCES
- Levinson & Meira, ”Natural Concepts‘ in the spatial topological domain–Adpositional meanings in crosslinguistic perspective: an exercise in semantic typology”, Language 79, pp. 485-516, 2003.
- Smith, Barry “Topological Foundations of cognitive science,” in Topological foundations of Cognitive Science, C. Echenbach, C. Habel, & B. Smith, (eds), 3-22: Hamburg: Graduiertenkolleg Kognitionswissenschaft, 1994.
- Levinson, S. C., “Frames of reference and Molyneux’s question: cross-linguistic evidence.” In P. Bloom, M A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and space. Language, speech, and communication (pp. 385–436). Cambridge, MA: MIT Press, 1996.
- Bowerman, M. “Learning How to structure space for language.” In M. Garrett ed. Language and Space, pp.385-436. MIT Press, 1996.
- O’Keefe, J. “The spatial prepositions in English, Vector Grammar and the Cognitive Map Theory.” In M. Garrett ed. Language and Space, pp.227-316. MIT Press, Cambridge.