Llava - Gen AI for image generation

This blog post is about one of the latest advancements in the field of generative AI. It showcases a new model called Llava which could be used offline for image generation and recognition of text from images.

AI IN THE CONSTRUCTION INDUSTRY

Mohamed Ashour

12/23/20232 min read

Overivew

😊 Another great advancement in the open source Large Language Modelling was achieved through the development of the latest Llava LLM. ☺️

🛠️🛠️ Briefly, Llava LLM is aimed at analysis of images and processing them to drive meaningful analysis. It simulates some of the functionalities of GPT-4 but in an open source manner. It is still not available for commercial use. It is licensed for non-commercial use only. The multimodal LLM does not yet have the capability of generating images from text. It works in one way until now (analysis of images and explaining them into paragraphs). 🛠️🛠️

➡️ Here is a link to an introduction about the model and its capabilities: https://llava-vl.github.io/ ⬅️

Relatedness to the construction industry

⁉️ So, how would that LLM be beneficial to us in the construction industry?

💡Well, it provides a wide range of functionalities:

Converting written timesheets into readily usable text.

✅ Converting invoices into numeric tables.

✅ Open source solution that could be trained more internally.

Generative AI landscape

❔So, the million dollar question is, what is the current landscape of deployable GenAI solutions? 🏙️

💡The GenAI landscape is now targeting to replicate GPT-4 by providing open source alternatives in the following areas:

👉 Text generation, reasoning and analysis. I managed to built multiple chatbots for this matter as you could find in these links:
https://www.linkedin.com/posts/mohamed-ashour-0727_mistral-llms-python-activity-7127036152062070784-cBT8/?utm_source=share&utm_medium=member_desktop

👉 Image analysis and reasoning. This is now available through Llava. I am going to develop a chatbot to query the images.

👉 Image generation using text information. This is available only through paid solutions such as GPT-4, Midjourney and others.

👉 Converting speech into text. This is available through open source libraries such as :SpeechRecognition pydub (https://thepythoncode.com/article/using-speech-recognition-to-convert-speech-to-text-python)

👉 Creating agents responsible for various tasks. This is already available using the current LLMs whether open source of closed source. (check the GPT4All post : https://www.linkedin.com/posts/mohamed-ashour-0727_llms-gpt4all-activity-7134489959113093120-1DQM/?utm_source=share&utm_medium=member_desktop)

Limitations

‼Last but not the least, it is not all pink and rosy, there are some drawbacks:

👉You need to have a general understanding of python and how to use virtual environments.

👉 A large GPU is needed for faster inference speed. It is recommended to have more than or equal to 24GB of GPU ram. So, it would be recommended to use RTX 3090 or 4090. Hence, the computing cost is going to be quite elevated (£>1500 for GPU alone). Laptops are most probably going to run out of GPU memory.

👉 It is still not available for commercial use. It is also still undertraining, so take the results with a pinch of salt.

Let me know your thoughts and stay tuned for further AI news and deployable solutions.

#LLMs

#AI

#Chatbots

#Image_recognition

#GenAI