Create Index

By default the settings look like this. Settings can be set as the index is created.

POST /indexes/{index_name}

Create and index with (optional) settings. This endpoint accepts the application/json content type.

Path parameters

Name	Type	Description
`index_name`	String	name of the index

Body Parameters

The settings for the index. The settings are represented as a nested JSON object.

Name	Type	Default value	Description
`index_defaults`	Dictionary	`""`	The index defaults object
`number_of_shards`	Integer	`3`	The number of shards for the index
`number_of_replicas`	Integer	`0`	The number of replicas for the index

Index Defaults Object

The index_defaults object contains the default settings for the index. The parameters are as follows:

Name	Type	Default value	Description
`treat_urls_and_pointers_as_images`	Boolean	`""`	Fetch images from pointers
`model`	String	`hf/all_datasets_v4_MiniLM-L6`	The model to use to vectorise doc content in `add_documents()` calls for the index
`model_properties`	Dictionary	`""`	The model properties object corresponding to `model` (for custom models)
`search_model`	String	The value of `model`	The model to use to vectorise query content in `search()` or `bulk_search()` calls for the index
`search_model_properties`	Dictionary	`""`	The model properties object corresponding to `search_model` (for custom models)
`normalize_embeddings`	Boolean	`true`	Normalize the embeddings to have unit length
`text_preprocessing`	Dictionary	`""`	The text preprocessing object
`image_preprocessing`	Dictionary	`""`	The image preprocessing object
`ann_parameters`	Dictionary	`""`	The ANN algorithm parameter object

Text Preprocessing Object

The text_preprocessing object contains the specifics of how you want the index to preprocess text. The parameters are as follows:

Name	Type	Default value	Description
`split_length`	Integer	`2`	The length of the chunks after splitting by split_method
`split_overlap`	Integer	`0`	The length of overlap between adjacent chunks
`split_method`	String	`sentence`	The method by which text is chunked (`character`, `word`, `sentence`, or `passage`)
`override_text_chunk_prefix`	String	`null`	A string to be added to the start of all text chunks in documents before vectorisation. Only affects vectors generated. Text itself will not be stored. Overrides `model_properties`-level prefix.
`override_text_query_prefix`	String	`null`	A string to be added to the start of all search text queries before vectorisation. Only affects vectors generated. Text itself will not be returned or used for lexical search. Overrides `model_properties`-level prefix.

Image Preprocessing Object

The image_preprocessing object contains the specifics of how you want the index to preprocess images. The parameters are as follows:

Name	Type	Default value	Description
`patch_method`	String	`null`	The method by which images are chunked (`simple` or `frcnn`)

ANN Algorithm Parameter object

The ann_parameters object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:

Name	Type	Default value	Description
`space_type`	String	`cosinesimil`	The function used to measure the distance between two points in ANN (`l1`, `l2`, `linf`, or `cosinesimil`.
`parameters`	Dict	`""`	The hyperparameters for the ANN method (which is always `hnsw` for Marqo).

HNSW Method Parameters Object

parameters can have the following values:

Name	Type	Default value	Description
`ef_construction`	int	`128`	The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096)
`m`	int	`16`	The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.

Model Properties Object

This flexible object, used by both model_properties and search_model_properties is used to set up models that aren't available in Marqo by default (models available by default are listed here). The structure of this object will vary depending on the model.

For Open CLIP models, see here for model_properties format and example usage.

For Generic SBERT models, see here for model_properties format and example usage.

These are used in the same manner for search_model_properties.

Search Model

Upon index creation, you can specify a model to be used only for vectorising search queries using the search_model setting. This is useful if you want to use a different model for search/bulk_search and add_documents. If you do not need these models to be different, do not specify a search_model and it will simply default to whatever model is.

Note You cannot specify a search_model without also specifying a model. Attempting to do this will result in an error. Also, the search_model must have the same dimensions as model.

Below is a sample index settings JSON object that defines both a model and search_model with search_model_properties.

{
    "index_defaults": {
        "model": "ViT-B/32",
        "search_model": "my_custom_search_model",
        "search_model_properties": {
            "name": "ViT-B-32-quickgelu",
            "dimensions": 512,
            "url": "https://github.com/mlfoundations/open_clip/releases/download/v0.2-weights/vit_b_32-quickgelu-laion400m_avg-8a00ab3c.pt",
            "type": "open_clip",
        }
    }
}

Below is a sample index settings JSON object. When using the Python client, pass this dictionary as the settings_dict parameter for the create_index method.

{
    "index_defaults": {
        "treat_urls_and_pointers_as_images": false,
        "model": "hf/all_datasets_v4_MiniLM-L6",
        "normalize_embeddings": true,
        "text_preprocessing": {
            "split_length": 2,
            "split_overlap": 0,
            "split_method": "sentence"
        },
        "image_preprocessing": {
            "patch_method": null
        },
        "ann_parameters" : {
            "space_type": "cosinesimil",
            "parameters": {
                "ef_construction": 128,
                "m": 16
            }
        }
    },
    "number_of_shards": 3,
    "number_of_replicas": 0
}

Example

cURLPython

curl -XPOST 'http://localhost:8882/indexes/my-first-index' -H 'Content-type:application/json' -d '
{
    "index_defaults": {
        "treat_urls_and_pointers_as_images": false,
        "model": "hf/all_datasets_v4_MiniLM-L6",
        "normalize_embeddings": true,
        "text_preprocessing": {
            "split_length": 2,
            "split_overlap": 0,
            "split_method": "sentence"
        },
        "image_preprocessing": {
            "patch_method": null
        },
        "ann_parameters" : {
            "space_type": "cosinesimil",
            "parameters": {
                "ef_construction": 128,
                "m": 16
            }
        }
    },
    "number_of_shards": 3,
    "number_of_replicas": 0
}'

index_settings = {
    "index_defaults": {
        "treat_urls_and_pointers_as_images": False,
        "model": "hf/all_datasets_v4_MiniLM-L6",
        "normalize_embeddings": True,
        "text_preprocessing": {
            "split_length": 2,
            "split_overlap": 0,
            "split_method": "sentence"
        },
        "image_preprocessing": {
            "patch_method": None
        },
        "ann_parameters" : {
            "space_type": "cosinesimil",
            "parameters": {
                "ef_construction": 128,
                "m": 16
            }
        }
    },
    "number_of_shards": 3,
    "number_of_replicas": 0
}
mq.create_index("my-first-index", settings_dict=index_settings)

Response: `200 OK`

{"acknowledged":true, "shards_acknowledged":true, "index":"my-first-index"}

No Model

You may want to use marqo to store and search upon vectors that you have already generated. In this case, you can create your index with no model. To do this, set model or search_model to the string "no_model" and define model_properties or search_model_properties with only the dimensions key. Set this to the size of the vectors you intend to use for this index.

Note that for a no_model index, you will not be able to vectorise any raw text documents or search queries. To add documents, use the custom_vector object field, and to search, use the context parameter with no q defined.

Example

index_settings = {
    "index_defaults": {
        "model": "no_model",
        "model_properties": {
            "dimensions": 123   # Put your custom vector size here!
        }
    },
}
mq.create_index("my-no-model-index", settings_dict=index_settings)

Create Index

Path parameters

Body Parameters

Index Defaults Object

Text Preprocessing Object

Image Preprocessing Object

ANN Algorithm Parameter object

HNSW Method Parameters Object

Model Properties Object

Search Model

Example

Response: 200 OK

No Model

Example

Subscribe to our mailing list

Response: `200 OK`