ColBERT: Contextualized Late Interaction BERT explained with a tutorial

Astarag Mohapatra
30 min readMar 9, 2024

In this article, we will go over the Colbert architecture, both v1 and v2. It is a neural Information Retrieval technique that can help us build RAG applications with LLMs. I will show you a tutorial on how you can build an end-to-end RAG system with data on SEC Filings and earnings call

Let’s start with the ColBERT architecture

Tutorial Link

ColBERT v1

  • An information retriever fetches semantically similar passages from our dataset. The Colbert architecture is a neural IR technique which does late-stage interaction between each query token and each token in a passage in the dataset.
Colbertv1 paper
  • In architecture (a), we have a BERT-based encoder and the query and a document (it is the same as the passage mentioned above) make one forward pass each, and the final output is sent into a similarity function like cosine similarity to measure the similarity, and we can select the top-k documents that are the most similar. However, here we need to compress the representation of the entire query and document into a single vector, which lacks expressivity. Also, there is very limited query-document interaction, as the interaction only occurs after the dimensionality has been compressed into a single vector.
  • In architecture (b), we have a query document interaction by a similarity function at the token level, and it is a very sparse matrix that goes into a downstream neural model to predict the similarity
  • It bears some semblance to another SOTA neural IR called SPLADEv2. Here instead of having a sparse matrix between query and document, we have a sparse matrix between the query and entire vocabulary, and the document and entire vocabulary. After that, they are combined to compute the similarity.
Taken from this YouTube video by Chris Potts
  • In architecture (c), here we concatenate the query and document into a single document for a forward pass, initialize with a task prediction special token, and initialize special tokens for the query and document.

[SIM] + [Q] + Query text + [D] Document text

  • Here the [SIM] special token will instruct the model to predict the similarity score, [Q] is for the start of the Query texts and [D] is for the start of the document texts. Now we send this concatenated text after tokenization to a BERT-base encoder to output the similarity score after one forward pass. The model has been trained by showing it positive examples of query and its relevant document and the objective is to predict 1, and the query with its non-relevant document (negative hard mining objective) to predict -1. However, it is a very expensive operation as for each document in our dataset we need to do a forward pass to get the similarity score, hence this does not scale well.
  • The architecture (d) is the colbert architecture of late-stage interaction. It addresses the shortcomings of low expressivity of single vector representations by having a representation at the token level and it does query-document interaction without doing an expensive forward pass each time with every new query.
  • I found another diagram that better explains Colbert. Below you can see that Colbert only considers the embeddings at the end, and each query token interacts with each document token embedding to compute the cosine similarity score. Then we take the maximum for each query-document token interaction, and finally, take the sum to get the maximum similarity score.
Taken from this YouTube video by Chris Potts
  • The authors remove the punctuation before this late interaction. Also, we are trying to find the document token that is most interesting to a query token (by taking max sim), and then sum them to get the final similarity score.
  • The max-pooling is preferred over mean-pooling because in the latter we are trying to find how is a given query token interesting to the entire document, which will be a single-vector to multiple-vector interaction and aggregation of results, which would lead to diminished expressiveness.
  • The Colbert architecture has been trained by the negative hard-mining method. We first start with the MS MARCO dataset, where we build the query embeddings by having the [CLS] token followed by [Q] special token. Similarly, for document embeddings, we have the [CLS] token followed by [D]. Then we pick a positive example where the document is relevant to query (d+) and an irrelevant query using a BM25 retriever (d-). Then the model was optimized via pairwise max cross-entropy loss over the computed scores of d+ and d−.
  • Now, end-to-end retrieval works in two steps (i) Using the FAISS library, we get the most relevant sentences, and Colbert can act as a re-ranker to get the top-K documents
The Colbert paper mentioned the FAISS search

This was the colbert v1 architecture, let’s move on to colbert v2

ColBERT v2

  • The colbert v2 re-ranking model (which is a MiniLM, 22M parameter model) distills the colbert v1 pre-trained model. The training pipeline is as follows,

The Colbertv1 Bert-based model fetches 64 documents for each training query → The MiniLM model predicts the similarity score between the query and fetched documents → Then the model optimizes for the KL-divergence scores between the Colbertv1 model score outputs and MiniLM score outputs. Also, there are some in-batch negative documents so that the model can differentiate between ranking positive and negative documents.

  • Now, the colbertv2 paper also innovates on the indexing. In colbert, we store vectors for each document token, which incurs a lot of storage space. Is there a way to compress this space?
  • The authors of this PLAID paper came up with centroid-based indexing and retrieval
Taken from this YouTube video by Chris Potts
  • Here we store each token embeddings by first clustering them and finding their centroids. Colbertv1 used 256-dimensional embedding, but Colbertv2 used 128-dimensional embeddings. Now, in extension to the centroids, we also store the distance between the centroid and the original document token embedding vector.
  • In the above example, we have C1 and r_1³ for the token w_1³ in the doc1. Hence, we can compress the embeddings while and during retrieval we can get the close centroids to the query token, and then get the full embeddings by decompressing, that is adding the distance to the centroid
From the Colbertv2 paper
  • This will save a lot of disk space for our offline indexing. The distance value is generally stored in 2 bits.

PLAID (Performance Optimized Late Interaction Driver)

  • This paper was an extension to Colbertv2, and it was about using centroids to make the retrieval faster. In the paper, they have shown the difference between Colbertv2 and PLAID retrieval times
From the PLAID paper
  • In the paper, they argue that Colbertv2 spends a significant amount of time in index lookup and decompression because of the transmission overhead from CPU to GPU. After matching with the centroids, now Colbertv2 has to fetch the centroids and residuals to get to the real document token-level representations. But transferring this to GPU takes and it is an overhead.
  • The PLAID algorithm extends the Colbertv2 algorithm by iteratively matching with the nearby centroids and pruning irrelevant documents based on some threshold value.
Taken from the PLAID paper
  • Here there are 4 stages, first, we match the centroids with the query and retrieve TopK(nprobe) documents. Then instead of going to late interaction as in Colbertv2 (directly jumping to Stage 4), we do this process of centroid by disregarding centroids that have similarity scores below a certain threshold. Then we do the remaining centroid candidates matching again with the query and keep only 1/4th of candidates (empirically they found it to perform better).
  • Hence, we have less number of centroids to do late-interaction which would be efficient for retrieval.

NOW LET’S START WITH THE TUTORIAL

We will particularly see how to filter IDs so that we can re-rank documents based on some metadata

Our data looks like the following

Document(page_content='During the three months ended April\xa01, 2023, Katherine L. Adams, Timothy D. Cook, Luca Maestri, Deirdre O’Brien and Jeffrey Williams, each an officer for purposes of Section 16 of the Exchange Act, had equity trading plans in place in accordance with Rule 10b5-1(c)(1) under the Exchange Act. An equity trading plan is a written document that preestablishes the amounts, prices and dates (or formula for determining the amounts, prices and dates) of future purchases or sales of the Company’s stock, including sales of shares acquired under the Company’s employee and director equity plans.', metadata={'accessionNumber': '000032019323000064', 'filing_type': '10-Q2', 'filingDate': '2023-05-05', 'reportDate': '2023-04-01', 'sectionName': 'OTHER_INFORMATION'})

Now we build a metadata dictionary which keeps track of the index value of a document and its metadata

#For document name 10-K
10-K
#Indexes in 10-K
{'186': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '187': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '188': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '189': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '190': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '191': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '192': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '193': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '194': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '195': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '196': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '197': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '198': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '199': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '200': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '201': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '202': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '203': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '204': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '205': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '206': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '207': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '208': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '209': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '210': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '211': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '212': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '213': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '214': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '215': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '216': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '217': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '218': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '219': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '220': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '221': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '222': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '223': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '224': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RISK_FACTORS'}, '225': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '226': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '227': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '228': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '229': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '230': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '231': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '232': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '233': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '234': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '235': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '236': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '237': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '238': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '239': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '240': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '241': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '242': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '243': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '244': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '245': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'LEGAL_PROCEEDINGS'}, '246': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'MINE_SAFETY'}, '247': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'MARKET_RISK_DISCLOSURES'}, '248': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'MARKET_RISK_DISCLOSURES'}, '249': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'MARKET_RISK_DISCLOSURES'}, '250': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'MARKET_RISK_DISCLOSURES'}, '251': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '252': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '253': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '254': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '255': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '256': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '257': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '258': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '259': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '260': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '261': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '262': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '263': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '264': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '265': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '266': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '267': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '268': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '269': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '270': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '271': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '272': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '273': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '274': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '275': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '276': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '277': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '278': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '279': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '280': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '281': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '282': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '283': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '284': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '285': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '286': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '287': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '288': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '289': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '290': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '291': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '292': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '293': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '294': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '295': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '296': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '297': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '298': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '299': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '300': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '301': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '302': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '303': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '304': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '305': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '306': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '307': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '308': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FINANCIAL_STATEMENTS'}, '309': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'CONTROLS_AND_PROCEDURES'}, '310': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'CONTROLS_AND_PROCEDURES'}, '311': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'CONTROLS_AND_PROCEDURES'}, '312': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'CONTROLS_AND_PROCEDURES'}, '313': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'CONTROLS_AND_PROCEDURES'}, '314': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'MANAGEMENT'}, '315': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '316': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '317': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '318': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '319': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '320': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '321': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '322': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '323': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '324': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '325': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '326': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '327': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '328': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '329': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '330': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '331': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '332': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '333': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '334': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '335': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '336': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '337': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '338': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '339': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '340': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '341': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '342': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '343': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '344': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '345': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '346': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '347': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '348': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '349': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '350': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '351': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '352': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '353': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '354': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '355': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '356': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '357': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '358': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '359': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '360': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '361': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '362': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'COMPENSATION'}, '363': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'RELATED_PARTY_TRANSACTIONS'}, '364': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'EXHIBITS'}, '365': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '366': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '367': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '368': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '369': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '370': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '371': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '372': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '373': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '374': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '375': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '376': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '377': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '378': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '379': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '380': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '381': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '382': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '383': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '384': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '385': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '386': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '387': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '388': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '389': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '390': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '391': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}, '392': {'accessionNumber': '000101872424000008', 'filing_type': '10-K', 'filingDate': '2024-02-02', 'reportDate': '2023-12-31', 'sectionName': 'FORM_SUMMARY'}}
from src.vectorDatabase import get_all_docs
from collections import defaultdict
import concurrent.futures
import colbert
from colbert import Indexer, Searcher
from colbert.infra import Run, RunConfig, ColBERTConfig
from colbert.data import Queries, Collection
from src.config import *
import os
from functools import lru_cache
import torch
import os
os.environ['COLBERT_LOAD_TORCH_EXTENSION_VERBOSE'] = "True"


checkpoint = 'colbert-ir/colbertv2.0'
# Number of bits to store the residual from centroid to actual document token embedding
nbits = 2
DOC_MAXLEN = 400
EXPERIMENT_NAME = 'Finance-Data'
#Batch size for indexing
INDEX_BSIZE = 256
#How many level of kmeans iteration for HNSW algorithm
KMEANS_ITER = 8
with Run().context(RunConfig(nranks=1, experiment=EXPERIMENT_NAME)): # nranks specifies the number of GPUs to use
config = ColBERTConfig(doc_maxlen=DOC_MAXLEN, nbits=nbits, kmeans_niters=KMEANS_ITER,index_bsize=INDEX_BSIZE) # kmeans_niters specifies the number of iterations of k-means clustering; 4 is a good and fast default.
# Consider larger numbers for small datasets.

indexer = Indexer(checkpoint=checkpoint, config=config)
index_name = f'SEC.Earningcalls.{ticker}_{year}.{nbits}bits'
indexer.index(name=index_name, collection=sentences, overwrite=True)

The above code will build the indexes from the sentences (passed in the last sentence).

GET THE INDEX

index_name = f"SEC.Earningcalls.{ticker}_{year}.{nbits}bits"
with Run().context(RunConfig(experiment=EXPERIMENT_NAME)):
searcher = Searcher(index=index_name, collection=sentences)
# Get the re-ranked data
def query_data_all(query: str, searcher, quarter_or_form_name: str, quarter_forms_dict,k:int=10,device="cuda:0"):
required_quarter_form_dict = quarter_forms_dict[quarter_or_form_name]
# Getting the relevant ids
relevant_ids = torch.tensor([int(i) for i in required_quarter_form_dict.keys()]).to(device)
results = searcher.search(
query,
#Number of passages to receive
k=k,
#Passing the filter function of relevant
filter_fn=lambda pids: torch.tensor(
[pid for pid in pids if pid in relevant_ids],dtype=torch.int32).to(device))
relevant_docs = ""
#If it is Earnings Call data
if quarter_or_form_name.startswith("Q"):
speaker_dict = {}
# print(*results)
for passage_id, _, _ in zip(*results):
metadata = required_quarter_form_dict[str(passage_id)]
speaker = metadata['speaker']
if speaker not in speaker_dict: speaker_dict[speaker]=""
speaker_dict[speaker]+=searcher.collection[passage_id]
for speaker,text in speaker_dict.items():
relevant_docs+=speaker+": "
relevant_docs+=text + "\n\n"
#If it is filings data
elif quarter_or_form_name.startswith("10"):
section_dict = {}
# print(*results)
for passage_id, _, _ in zip(*results):
# print(passage_id,searcher.collection[passage_id])
metadata = required_quarter_form_dict[str(passage_id)]
section = metadata['sectionName']
if section not in section_dict: section_dict[section]=""
section_dict[section]+=searcher.collection[passage_id]
for section,text in section_dict.items():
relevant_docs+=section+": "
relevant_docs+=text + "\n\n"
return relevant_docs

NOW LET’S TEST IT OUT FOR ONLY QUARTER

print(query_data_all(
"How did Amazon do in AWS? ",
searcher,
"Q1",
quarter_forms_dict
))

ANSWER

Andy Jassy: Language Models, they take many years to build and many billions of dollars to build. And there will be a small number of companies that want to invest that time and money, and we’ll be one of them at Amazon, but most companies don’t. And so what most companies really want and what they tell AWS is that they’d like to use one of those foundational models and then have the ability to customize it for their own proprietary data and their own needs and customer experience. And they want to do it in a way where they don’t leak their unique IP to the broader generalized model. And that’s what Bedrock is, which we just announced a week ago or so. It’s a managed foundational model service where people can run foundational models from Amazon, which we’re exposing ourselves, which we call Titan. Or they can run it from leading Large Language Models providers like AI21 and Anthropic and Stability AI. And they can run those models, take the baseline, customize them for their own purposes and then be able to run it with the same security and privacy and all the features they use for the rest of their applications in AWS. That’s very compelling for customers. And then that third layer are really the applications that are going to be built on top of those Large Language Models. So, ChatGPT is a good example of an application that’s being built. We’ll build some of those applications ourselves. So for instance, we think one of the most compelling applications that are going to be built inYes. I’ll try and answer those together because they’re somewhat related. I think when you think about machine learning, it’s useful to remember that we have had a pretty substantial investment in machine learning for 25-plus years in Amazon. It’s deeply ingrained in virtually everything we do. It fuels our personalized e-commerce recommendations. It drives the Pick Pass in our fulfillment centers. We have it in our Go stores. We have it in our Prime Air, our drones. It’s obviously in Alexa. And then AWS, we have 25-plus machine learning services where we have the broadest machine learning functionality and customer base by a fair bit. And so, it is deeply ingrained in our heritage. I think if you look at what’s happened over the last 9 months or so is that these Large Language Models and generative AI capabilities, they’ve been around for a while, but frankly, the models were not that compelling before about 6, 9 months ago. And they have gotten so much bigger and so much better, much more quickly that it really presents a remarkable opportunity to transform virtually every customer experience that exists and many that don’t exist that weren’t really that easily made possible before. And so, it’s very early days in that space, but probably not surprisingly, we’ve been investing in building in our own Large Language Models for several years, and we have a very large investment across the Company. And the way I would break it out, Brian, is I would say that there’s three macrosame-day deliveries and are on track to have our fastest Prime delivery speeds ever in 2023. On the advertising side, we’re continuing to buck wider advertising trends and deliver robust growth. I think there are a few reasons for it. First, even in difficult economies, most people still shop. And with the largest e-commerce shopping venue, we have a lot of customers that companies seek to reach. That, coupled with our very substantial investment in machine learning to make sure customers see relevant ads when they’re looking for various items, have meant that these advertisements have performed unusually well for brands, which makes them want to advertise on Amazon. It’s also worth noting that we’re still very early in our efforts to find a way to thoughtfully place ads in our broader video, live sports, audio and grocery properties. We have a lot of upside still in advertising. In AWS, what we’re seeing is enterprises continuing to be cautious in their spending in this uncertain time. Customers are looking for ways to save money however they can right now. They tell us that most of it is cost optimizing versus cost cutting, which is an interesting distinction because they say they’re cost optimizing to reallocate those resources on new customer experiences. One of the great attributes of the cloud is that you can scale seamlessly up or down as demand dictates, which is not the case with on-premise’ infrastructure. Customers want help finding ways to spend less during thisYes. In terms of the calling out health care and Kuiper in my annual letter, I think what I was trying to do in the letter was explain how we think about investing -- and how we think about our big new investments that we make. And I talked about in the letter that we look at a few things. We look at if -- if it’s successful, could it be big and move the needle for Amazon with the right ROICs? Is that experience being well served today elsewhere? Do we have some kind of differentiation? And do we have some confidence that the Company in that area? If not, can we acquire it quickly? And we’d like the answers to those questions, we will invest. Some of those investments lead to what seem like relatively straightforward investments. And I talked a little bit about category expansion and international expansion in our stores business and some of the nascent retail market segments that are large for us that we think we can have big businesses in and business to business, our Amazon business entity and grocery and things like Buy with Prime, which allow our consumers to use their Prime Benefit and other third-party websites beyond Amazon and also let merchants convert at a higher rate because Prime members are able to pay quickly and then get that fast, reliable shipping they get from Prime. But then there are other investments I was pointing out that sometimes don’t lead to categories that people might initially guess. And AWS was a good example of that where that seemed reallyto be built in generative AI have to do with making developers much more effective with coding assistance. And so, we built something called CodeWhisperer, which we just announced the general availability for, where developers can plug in a natural language, something like -- I want to build a video hosting website. And CodeWhisperer will bring up the code you need and the developer needs to employ and put that in production, which is really compelling. If you think about how much more productive a developer is going to be and what they’re going to spend their time on instead of rewriting code that as [Indiscernible] takes time, I think it’s a big deal. Now, to your second question, and it’s related to this top layer I was just talking about, we’re going to build a very -- every single one of our businesses inside Amazon are building on top of Large Language Models to reinvent our customer experiences, and you’ll see it in every single one of our businesses, stores, advertising, devices, entertainment. And devices, which was your specific question, is a good example of that. I think when people often ask us about Alexa, what we often share is that if we were just building a smart speaker, it would be a much smaller investment. But we have a vision, which we have conviction about that we want to build the world’s best personal assistant. And to do that, it’s difficult. It’s across a lot of domains and it’s a very broad surface area. However, if you think about the advent ofless during this challenging time. And given that it’s best for customers long term, we’ve been actively helping customers make these adjustments. We’ve spent a fair bit of time analyzing what we’re seeing, and I’ve spent a good chunk of time myself looking as well, and we like the fundamentals of what we’re seeing in AWS. The new customer pipeline looks strong. The set of ongoing migrations of workloads to AWS is strong. The product innovation and delivery is rapid and compelling. And people sometimes forget that 90-plus percent of global IT spend is still on-premises. If you believe that equation is going to flip, which we do, it’s going to move to the cloud. And having the cloud infrastructure offering with the broadest functionality by a fair bit, the best securing operational performance and the largest partner ecosystem bodes well for us moving forward. But we’re not close to being done inventing in AWS. Our recent announcement on Large Language Models and generative AI and the chips and managed services associated with them is another recent example. And in my opinion, few folks appreciate how much new cloud business will happen over the next several years from the pending deluge of machine learning that’s coming. This past year has seen us do a fair bit of cost streamlining. As I mentioned in my recent shareholder letter, we took a deep look across the Company and asked ourselves whether we had conviction about each initiative’s long-term potential to drive enoughthat seemed really different for us when we started to pursue that in 2003. And we’re a pretty different company because we did so, even though there were a lot of people externally and internally that thought was a little bit crazy. And so I just chose two of them there. I could have chosen a lot more. The letter was long enough as it was, so I just chose two but I chose two that we have conviction about. On the health care side, when you think about that set of questions that we ask ourselves when we consider whether we should make big investments, health care is a multitrillion dollar business that’s very segmented, and it’s really broken in the U.S. particularly, I think in other parts of the world, too, but particularly in the U.S. And we had what we thought were some differentiated ways that we could be successful at. And I think when -- our customers have been asking us for years to provide a pharmacy. And if you think about that, it’s not -- that’s a pretty natural extension from what we do in retail, and we’ve launched Amazon Pharmacy in 2020 and I think it’s off to a good start. It’s continuing to grow. We have a lot to do there. But a lot of our customers who like that experience said, "Gosh, I wish you guys would help us in the broader health care experience." And if you think about trying to meaningfully change that experience, primary care is right at the center of it. And if you look at the experience that’s been the case for the last several decades, we’rethere’s three macro areas in this space. If you think about maybe the bottom layer here, is that all of the Large Language Models are going to run on compute. And the key to that compute is going to be the chip that’s in that compute. And to date, I think a lot of the chips there, particularly GPUs, which are optimized for this type of workload, they’re expensive and they’re scarce. It’s hard to find enough capacity. And so, in AWS, we’ve been working for several years on building customized machine learning chips, and we built a chip that’s specialized for training -- machine learning training, which we call Trainium, a chip that’s specialized for inference or the predictions that come from the model called Inferentia. The reality, by the way, is that most people are spending most of their time and money on the training. But as these models graduate to production, where they’re in the apps, all the spend is going to be in inference. So, they both matter a lot. And if you look at -- we just released our second versions of both Trainium and Inferentia. And the combination of price and performance that you can get from those chips is pretty differentiated and very significant. So we think that a lot of that machine learning training, inference will run on AWS. Then if you think about -- so you have to train the models, you have to run the inference, then you got to -- but you have to build the models. And if you look at the really significant leading Large Language Models,

Brian Olsavsky: and shipping services are a key contributor to the selection offered to customers. We also continued to invest meaningfully in brand protection efforts, including industry-leading technology, so that sellers can trust we will provide a great selling experience free from bad actors. Sellers comprised 59% of overall unit sales in Q1, up from 55% one year ago. We also saw strong engagement in our advertising services with revenue up 23% year-over-year, excluding the impact from changes in foreign exchange rates. In particular, our sponsored product and brand offerings remain a key driver of growth as we work with advertisers to help customers make more informed purchase decisions. Our teams remain focused on delivering performance through our comprehensive and flexible measurement capabilities along with insights that allow advertisers the ability to measure the return on their advertising spend and help them grow their business. In AWS, net sales were $21.4 billion in the first quarter, up 16% year-over-year and representing an annualized sales run rate of more than $85 billion. Given the ongoing economic uncertainty, customers of all sizes in all industries continue to look for cost savings across their businesses, similar to what you’ve seen us doing at Amazon. As expected, customers continue to evaluate ways to optimize their cloud spending in response to these tough economic conditions in the first quarter. And we are seeing these optimizations continue into the secondYes. Hi Justin. Thank you. On AWS, I think Andy did a good job of laying out the dynamics we’re seeing in -- among customers right now and where they’re cutting workloads and continued strength that we see in customers hitting their contractual limits and extending them and planning for the future. So, we feel really strongly about the outlook for the business and understand the short-term work that we’re doing to help customers save money. So, I would say Q2 versus Q1, there’s not an obvious year-over-year comp differential. It’s just, again, understanding which customers are cutting in some areas and growing in others and helping them get on hopefully to the new initiatives that they are planning as well.

DO TRY THE NOTEBOOK

This brings us end to our tutorial on COlbert. I hope that you guys liked it, and if you have any queries, then I am comment away.

--

--

Astarag Mohapatra

Hi Astarag here, I am interested in topics about Deep learning and other topics. If you have any queries I am one comment away