A Computational Framework for Simulating Cross-Linguistic Acquisition of Spatial Prepositions

Pamela Fox
University of Southern California, Los Angeles
pamela.fox at alumni.usc.edu



Quick links:



INTRODUCTION

The current trend in teaching computers about human languages is to combine massive amounts of data with brute-force statistical methods.

This project hopes to teach a computer language the same way that a child would learn it, by analyzing a 3-dimensional world and linguistic input.

Because spatial prepositions describe basic 3-dimensional concepts that must be acquired early in childhood, they are the current focus of this project.




BACKGROUND TERMINOLOGY & RESEARCH

We first review terminology and agreed categories of spatial prepositions, so we can decide later if children might use different strategies for different categories.




SPATIAL PREPOSITION LEARNING METHODS

Now that we know the various categories of spatial prepositions, we review research on cross-linguistic description of spatial prepositions to determine possible techniques for learning the prepositions.

Since its difficult to actually study children, the research reviewed focuses on describing spatial prepositions across languages. If a description method is found that can describe prepositions in all languages, then its possible children employ the same description method cognitively.

While reviewing the methods suggested by various research, we realized that both the encoding of the spatial memory and the learning from these memories likely differ for the topological spatial prepositions than for the ones employing frames of reference. This is due to the inherent nature of the domain covered by the two types of prepositions. As described earlier, whereas the topological prepositions are not concerned with orientation, the others are primarily concerned with orientation.

Note that we treat the vertical frame of reference as topological as it imposes the same learning restraints, and for horizontal frames of reference, we focus on the relative frame of reference because 1) asking the computer to employ an intrinsic system would require it also learning how to perform feature-assignment, which is a problem outside the scope of this research, and 2) once the object features were known, learning the intrinsic system would be just a simple transformation of the relative speaker-based system to the ground-based system.




IMPLEMENTATION

With the cognitive techniques for learning chosen, we can now implement the knowledge acquisition and learning of spatial prepositions within a 3-d computational environment.

A 3-d scene, shown above, is setup with simple primitives, and a GUI allows the user to input sentences like “ball is above cone.” The program will find the known objects in inputted sentences and assign them as figure and ground accordingly.

A brain node in the scene keeps track of memory, creating a node for each new example of a sentence template (“figure_isabove_ground”), and recording the values of the attributes after analyzing the scene.

The calculations for the attributes are shown in the tablesbelow.

For prepositions describing relative frame of reference:

direction of eye sweep The angle between the vector from the eye to the figure and the vector from the eye to the ground is calculated, the cross product of the angle is calculated to determine the direction of the angle. One direction is chosen as negative, and one as positive, arbitrarily. If the angle is 0 degrees, the result is “none.”
distance to eye The magnitude of the vector from the eye to the figure and the magnitude of the vector from the eye to the ground are both calculated. The comparison between their magnitudes is reported as greater than, less than, or equal within a very small threshold.

For topological prepositions:
horizontal proximity The vectors between the horizontal bounds of the figure and ground are all compared, and the magnitude of the shortest vector is calculated. If that magnitude is within some small threshold, horizontal proximity is recorded as true.
vertical proximity The vectors between the vertical bounds of the figure and ground are all compared, and the magnitude of the shortest vector is calculated. If that magnitude is within some small threshold, vertical proximity is recorded as true.
full containment of The positions of the corners of the bounding boxes of the figure and ground are calculated. If the bounding box of the ground is within the bounding box of the figure, this property is recorded as true.
full containment by The positions of the corners of the bounding boxes of the figure and ground are calculated. If the bounding box of the figure is within the bounding box of the ground, this property is recorded as true.
partial containment of The positions of the corners of the bounding boxes of the figure and ground are calculated. Every corner of the ground is compared to the bounding box of the figure. If at least one of the corners is within the bounding box, then this property is recorded as true.
partial containment by The positions of the corners of the bounding boxes of the figure and ground are calculated. Every corner of the figure is compared to the bounding box of the ground. If at least one of the corners is within the bounding box, then this property is recorded as true.
contact The corners of the bounding boxes of the figure and ground are all compared with eachother. If the distance between one of them is within a small threshold, contact is reported to be true.
attachment All the objects in the scene are examined to see if there is a ternary object that exhibits both partial containment in the figure and partial containment in the ground. If it finds an object with this property, attachment is reported as true.
superposition The vector from the figure to the ground plane is compared to the vector from the ground object to the ground plane. If the vector from the figure to ground is greater than the vector from the ground object to ground plane, this property is reported as true.
subposition The vector from the figure to the ground plane is compared to the vector from the ground object to the ground plane. If the vector from the ground to ground object is greater than the vector from the figure to ground plane, this property is reported as true.



RESULTS


For prepositions describing relative frame of reference:

Decision tree generated after describing scenes with “in back of,” “in front of,” “right of,” and “left of.”

The algorithm creates a clean decision tree that classifies situations correctly, but only when it learns from only the direction of eye sweep and distance of eye vector attributes. It needs much more data when the other attributes are included to make a clean tree.


For topological prepositions:

Decision trees, above, are generated after describing the 3 scenes from FIGURE 1 in three of the languages, shown below in the 3-d environment.

The memory nodes for each of the languages have to be inputted separately into the algorithm because of overlapping prepositions.

Though the decision trees are not always as specific as we would expect, since they are presented with only these three situations, they are accurate and distinct. Using this method, children could start learning the relation of conceptual methods to spatial prepositions in input immediately, with no need for massive input. The algorithm is also storage-efficient, as it only creates as large a tree as possible.





CONCLUSION

We have proposed memory recording and learning methods for learning two broad categories of spatial prepositions. The results of implementing these methods are promising, which suggests they could be employed by children as successfully as by computers.

The success of testing spatial preposition learning in a 3-d computational framework should encourage similar testing of a broader range of spatio-temporal theories.

There is still work remaining to refine exactly the list of spatial primitives, and to test the other horizontal frames of reference (intrinsic, absolute).




ACKNOWLEDGMENTS

I would like to thank Barry Schein, my advisor for this project, as well as David Kempe, Toby Mintz, and Jerry Hobbs for their advice.





REFERENCES

  1. Levinson & Meira, ”Natural Concepts‘ in the spatial topological domain–Adpositional meanings in crosslinguistic perspective: an exercise in semantic typology”, Language 79, pp. 485-516, 2003.
  2. Smith, Barry “Topological Foundations of cognitive science,” in Topological foundations of Cognitive Science, C. Echenbach, C. Habel, & B. Smith, (eds), 3-22: Hamburg: Graduiertenkolleg Kognitionswissenschaft, 1994.
  3. Levinson, S. C., “Frames of reference and Molyneux’s question: cross-linguistic evidence.” In P. Bloom, M A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and space. Language, speech, and communication (pp. 385–436). Cambridge, MA: MIT Press, 1996.
  4. Bowerman, M. “Learning How to structure space for language.” In M. Garrett ed. Language and Space, pp.385-436. MIT Press, 1996.
  5. O’Keefe, J. “The spatial prepositions in English, Vector Grammar and the Cognitive Map Theory.” In M. Garrett ed. Language and Space, pp.227-316. MIT Press, Cambridge.