Software Open Access
Robert A. Jacobs; Chenliang Xu
This python code implements a system described in the article: Jacobs, R. A. & Xu, C. (2019). Can multisensory training aid visual learning?: A computational investigation. Journal of Vision, in press. The code and the text here will make much more sense if the reader first reads the article. As described in the article, we implemented a beta variational autoencoder (beta-VAE) that received both visual and haptic signals regarding the shapes of objects. The implementation here is a slight variant of the implementation described by Louis Tiao in his web post titled "Implementing Variational Autoencoders in Keras: Beyond the Quickstart Tutorial". In this code, the haptic data are the GraspIt! joint angles for each Fribble. Recall that GraspIt! has 16 joints and that each Fribble was grasped 24 times, meaning that there are 384 values. The dimensionality of these values was then reduced via PCA to 200 features (accounting for more than 99% of the variance in the haptic values). Each low dimensional value has been normalized so that it has a mean of zero and a variance of one. The visual data items were created as follows. First, there are two images of each Fribble, the original image and a flipped (left-right) image. These images were then presented to VGG16, and we extracted the output of the convolution base (7 X 7 X 512 = 25088 values). Given the values of the convolution base for each image of each Fribble (2 imagex X 891 Fribbles), we then did PCA to reduce the dimensionality to 200 (accounting for more than 97% of the variance in the convolution base values). Each of the values in the low-dimension space was then normalized to have a mean of zero and a variance of one. For each Fribble, there are 2 data items: -- original image and haptic data -- flipped image and haptic data For each data item, the target labels include both visual and haptic data.