(Tel Aviv University and Facebook AI Research, Israel)
I will describe four strategies for building robust models that are able to generalize beyond specific training datasets. The first is applied in the context of computerized archeology as part of the ArchAIDE project, two methods are employed in the context of document analysis, and the last one is applied to generate music and was done by the Facebook AI Research team at Tel Aviv.
(i) By employing a point-net like architecture, coupled with domain-specific augmentation, we are able to learn to identify images of pottery sherds from synthetic 3D data. The method can handle the varying shapes in which a 3D model can break, and manage to classify the sherds by the shape of the fracture alone.
(ii) By designing the network to be language- and style-agnostic, we obtain a generic word detector. Our approach efficiently detects words in a variety of scanned document images, including historical handwritten documents and modern-day handwritten documents, presenting excellent results on various literature benchmarks.
(iii) By using virtual samples, we are able to train a Handwritten Character Recognition system on purely synthetic data and apply it to ancient documents. We focus on low resource languages, and , for example, present a very significant increase in transcription accuracy for a test set of 167 images from the bKa’ gdams gsung ’bum Tibetan collection.
(iv) By reusing a single music encoder network, we are able to perform convincing translations across musical instruments, genres, and styles. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not seen during training. The method is unsupervised and does not rely on supervision in the form of matched samples between domains or in the form of musical transcriptions.