3DMSE: An Interactive 3D Media Search Engine

We present the 3D Media Search Engine (3DMSE), which is designed to facilitate the exploration and retrieval of 3D models and images. 3DMSE incorporates unimodal, cross-modal and multimodal retrieval, using any combinations of mesh, point-cloud and multi-image representations. The 3DMSE system is built on the recently proposed MuseHash approach for multimodal representation, and offers a user-friendly web interface that enables formulating queries, presenting search results, and visualising 3D information in an accessible manner.


INTRODUCTION
In recent years, advancements in 3D modeling tools [16], scanning technology [12], and consumer devices with 3D sensors [4] have transformed the accessibility of vast 3D content collections, influencing various domains, from entertainment and gaming to healthcare [25], archaeology [1], computer-aided design (CAD) [7] and autonomous systems [6].These changes have not only made it easier for anyone to create 3D content but also radically changed industries by enabling faster design processes and helping make better decisions.
There are various 3D applications specifically designed for retrieval and management.CAD software (e.g., Autodesk AutoCAD [9], SolidWorks [15], and CATIA [21]) often include built-in features for managing and retrieving 3D data.Geospatial Information Systems (GIS, e.g., Esri ArcGIS [23] and QGIS [3]) also offer tools for managing and visualizing 3D geospatial data.Extended Reality (VR/AR/MR) platforms (e.g., Unity [14] and Unreal Engine [19]) offer built-in features for managing and retrieving 3D data.Efficiently navigating this abundance of models, however, remains a challenge.
Techniques for retrieving 3D models play a crucial role in various applications, allowing users to find relevant models based on classification or similarity criteria [5,8,13,17,20,24] .The 3D data can be represented in different forms: as point clouds, which consist of points within the model's spatial domain; as meshes, represented by interconnected triangles to approximate the shape; or as multiview images, comprising a set of images captured from various viewpoints of the 3D shape.Effectively handling these diverse 3D representations is considered as modalities and has been addressed using the MuseHash approach.This demonstration paper described the 3DMSE search engine, which builds a user-friendly web-based interface on top of the MuseHash approach.
The remainder of this paper is organised as follows: Section 2 presents MuseHash, the state-of-the-art multimodal retrieval method underlying 3DMSE, while Section 3 details the UI and its features.Section 4 presents two example retrieval scenarios using the 3DMSE system.Finally, Section 5 provides the conclusion.

SIMILARITY SEARCH WITH MUSEHASH
The 3DMSE system integrates the advanced multimodal retrieval method MuseHash with an interactive Graphical Interface (GUI), enabling browsing in supported 3D collections and various data and retrieval types.The current system, 3DMSE, boasts compatibility with a wide array of data types, including meshes (consisting of interconnected triangles to approximate the shape), point clouds (comprised of data points of significance within the model's spatial domain), and images, offering versatility for users' needs.The retrieval types in 3DMSE encompass uni-modal, cross-modal, and multimodal approaches, providing users with diverse options for searching and interacting with 3D models.Uni-modal retrieval focuses on a single data modality, cross-modal retrieval enables searching one data modality based on a query in a different data modality, while multimodal retrieval integrates multiple data modalities for comprehensive results.
The retrieval module utilises MuseHash [17,18], a Bayesianbased method for retrieving visually similar content across various modalities.It employs Bayesian techniques to learn hash functions, enabling retrieval across unimodal, cross-modal ( = 1), and multimodal scenarios ( > 1).The visual modality involves averaging 180 feature vectors from ResNet50's fc-7 layer [11], resulting in a 2048-D vector.For point cloud and mesh modalities, 256-D vectors are directly obtained from DGCNN [24] and MeshNet [5].These feature vectors are transformed into hash codes using learned hash functions.In the multimodal case, hash codes across different modalities are fused into a single code   using the Equation below, where  , is the hash code of -th instance and -th modality, with ⊕ denoting the XOR operation between hash codes.Finally, retrieved items are ranked based on Hamming distance:

USER INTERFACE AND SEARCH MODES
3DMSE system integrates two public benchmark 3D datasets; the BuildingNet_v0 dataset [22] that contains different building models (e.g., church, hotel) and the ShapeNetCore dataset [2] that includes different object models (e.g., airplane, car).The system provides users with the capability of locally storing data (Section 3.1), empowers users to seamlessly navigate through the stored data (Section 3.2) and retrieve specific information directly from the database (Section 3.3).
In 3DMSE, a robust technology stack ensures seamless functionality and user experience.Back-end data processing is implemented in PHP, using MongoDB [10] to securely store model information.Each MongoDB collection corresponds to a specific dataset (e.g., ModelNet40, BuildingNet_v0), organizing documents that represent individual 3D models.These documents include information such as the file name, the category, and metadata for the types of models (mesh, point cloud, image) each dataset supports.The intuitive frontend interface is implemented with HTML/CSS/JavaScript, using Google's model-viewer.jsto render mesh models for an immersive viewing experience, while Three.jsfacilitates dynamic interaction with point cloud models.
Figure 1 shows the starting screen of the interface.Upon entering 3DMSE, users see the initial results panel that displays snapshots of the models.At the top-right of this component is a drop-down list from which users can navigate to any of the available datasets (currently, BuildingNet_v0 and ShapeNetCore) and view their respective models.In the same corner is another drop-down list that enables users to select among the three modality types available for each dataset: Mesh, Point Cloud and Image.A pagination control is also provided, in the top-left corner, which ensures easy navigation, enabling transitions between pages that contain 100 models each.These three dropdown lists are featured in Figure 1.

Local Data Storage
When hovering over a snapshot, users can view the model's name and two buttons, as shown in Figure 2. The download button enables users to choose and download either the snapshot (in PNG format) or the model itself (in GLTF format for mesh models or PLY format for point cloud models).Another method for downloading data while browsing is discussed in Section 3.2.The left button is a part of the retrieval process and is explained in Section 3.3.These details are illustrated in Figure 2, where the model of interest is enclosed within a red square, aiding users in navigating the interface effectively.

Browsing
By clicking on the snapshot of a model, a model viewer appears.This component provides a dynamic space where users can interact with 3D models (rotate and zoom), or view images in full-screen (in case the selected type from the dropdown is 'Image').It also contains a button for downloading the model (with similar options as before).
Examples of the 3D model viewer for a mesh model and for a point cloud model are illustrated in Figures 4 and 3, respectively.

Retrieval
By clicking the left button mentioned earlier (Figure 2 in Section 3.1), users initiate the appearance of a popup window, as depicted in Figure 5.This window offers users the option to select the types of models for retrieval, whether it be uni-modal, cross-modal, or multimodal.For instance, if a user selects Mesh for similarity, the system will retrieve mesh models.Alternatively, choosing Mesh for uni-modal retrieval, Point cloud for cross-modal retrieval, or more than two modalities for multimodal retrieval is also possible.Upon clicking the search button, a query is dispatched to the MuseHash service (Section 2).This service then returns models of the selected type(s) similar to the chosen one, ranked from most to least relevant.Subsequently, the system opens the similar results panel (Figure 6) for further exploration by the user.
In the similar results panel, users can explore models that closely relate to the selected one.Each displayed model within this panel provides users with access to the same set of buttons and functionalities available in the initial results panel, ensuring consistent interaction capabilities.Positioned prominently at the top of the panel is a snapshot of the selected model, allowing users to quickly

EXPLORING SMALL SCENARIOS
To illustrate the functionality of 3DMSE, we present two scenarios; one uni-modal and one multimodal.We provide a comprehensive explanation of the uni-modal scenario and a brief description of the multimodal scenario.

Uni-modal Scenario
In this scenario, the objective is to pinpoint a particular building (identified as "COMMERCIALhotel_building_mesh1286").This specific structure is described as being tall and slender, composed of  This detailed guide is intended to facilitate a smooth navigation experience through the interface, assisting users in efficiently locating the designated mesh hotel building.Given that a uni-modal scenario has been detailed provided, we offer for completeness a concise multimodal scenario below.

Multimodal Scenario
Your objective now is to find the same building as in Uni-modal scenario, but now taking advantage of the mesh and point cloud modalities.Similarly as before, we pick the building with name "COMMERCIALhotel_building_mesh0555" and choose Image and Mesh from the types' option popup window.A list of relevant results will be presented.In the 4th row and 2nd column you will find the target "COMMERCIALhotel_building_mesh1286".Therefore, by utilising two modalities instead of one, we have reached the target in one step fewer.

CONCLUSIONS
In summary, our investigation demonstrates that the 3DMSE system effectively assists users in retrieving and managing 3D data.By combining the MuseHash retrieval approach with a user-friendly interface, 3DMSE offers a straightforward way for users to explore and interact with different types of 3D models.Its versatility in handling various data formats makes it a potentially valuable tool across different application domains.

Figure 1 :
Figure 1: The initial results panel and its dropdown lists.

Figure 2 :
Figure 2: Hovering over a snapshot and available buttons.

Figure 3 :
Figure 3: The 3D model viewer for a point cloud model.

Figure 4 :
Figure 4: The 3D model viewer for a mesh model.

Figure 6 :
Figure 6: The similar results panel and a model's details.

( 1 )
Begin by accessing the interface and changing the dataset to BuildingNet_v0 and the model type to Mesh using the respective popup lists.(2)Navigate to the left popup menu and select page 1 to modify the interface's display.(3) Utilize the selection mechanism on the UI's interface to pick a tall and narrow building from the list, such as the building with name "COMMERCIALhotel_building_mesh0555".(4) Click the icon positioned at the top left corner of the interface to initiate a potential search and choose Mesh from the types' option popup window.(5) A list of relevant results will be presented.Use your mouse to explore this list and locate the building image named "COMMERCIALhotel_building_mesh1285" in the 1st row, 1st column.(6) Click on the image "COMMERCIALhotel_building_mesh1285" to set it as your query and choose again Mesh from the types' option popup window.(7)In the returned list, find the target image with the name "COMMERCIALhotel_building_mesh1286" in the 1st row, 1st column, and click on it to access additional details or related information.