With the tremendous increase in the amount of multimedia data in general and video databases in particular has increased the need for effective indexing and retrieval mechanisms. In most cases, video retrieval is based on user assigned tags and not on the actual content of the video. The proposed research is aimed at developing a content based video indexing and retrieval system. The caption text appearing in videos will be used as the primary index while the audio content in these videos will serve as the secondary index. The text module will rely on extracting the occurrences of textual content in video independent of its script. Once the text is extracted, it will be fed to a script recognition module that will identify the script of the text in question so that the subsequent processing is carried out by the respective modules of each script/language. Indexing can be implemented either through recognition of text or using a word spotting based technique. The former requires a video OCR while the later would need clustering of ‘similar’ shapes (words) into classes and subsequent matching of query words using shape matching algorithms. Any of these solutions could be developed to target a defined vocabulary of (key)words. The audio indexing module will rely on identifying the occurrences of the keywords in a vocabulary in the audio stream of the video. Once the videos are indexed, user may then provide a query keyword and retrieve all the frames of all the videos containing the text or/and spoken occurrences of the provided word.