myownskyW7 commited on
Commit
3a3f54b
·
verified ·
1 Parent(s): f96ea5f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +154 -5
README.md CHANGED
@@ -1,5 +1,154 @@
1
- ---
2
- license: other
3
- license_name: other
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ pipeline_tag: visual-question-answering
4
+ ---
5
+
6
+
7
+ <p align="center">
8
+ <img src="logo_en.png" width="600"/>
9
+ <p>
10
+
11
+ <p align="center">
12
+ <b><font size="6">InternLM-XComposer-2.5-Chat</font></b>
13
+ <p>
14
+
15
+ <div align="center">
16
+
17
+ [💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
18
+
19
+ [Online Demo](https://huggingface.co/spaces/Willow123/InternLM-XComposer)
20
+
21
+ [Paper](https://huggingface.co/papers/2407.03320)
22
+
23
+ </div>
24
+
25
+ **InternLM-XComposer2.5-Chat** is a chat model trained on [internlm/internlm-xcomposer2d5-7b](https://huggingface.co/internlm/internlm-xcomposer2d5-7b),
26
+ offers improved multi-modal instruction following and open-ended dialogue capabilities.
27
+
28
+ ### Import from Transformers
29
+ To load the InternLM-XComposer2-4KHD model using Transformers, use the following code:
30
+ ```python
31
+ import torch
32
+ from transformers import AutoTokenizer, AutoModelForCausalLM
33
+ ckpt_path = "internlm/internlm-xcomposer2d5-7b-chat"
34
+ tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
35
+ # Set `torch_dtype=torch.floatb16` to load model in bfloat16, otherwise it will be loaded as float32 and might cause OOM Error.
36
+ model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
37
+ model = model.eval()
38
+ ```
39
+
40
+ ## Quickstart
41
+
42
+ We provide a simple example to show how to use InternLM-XComposer2.5 with 🤗 Transformers.
43
+
44
+ <details>
45
+ <summary>
46
+ <b>Video Understanding</b>
47
+ </summary>
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModel, AutoTokenizer
52
+
53
+ torch.set_grad_enabled(False)
54
+
55
+ # init model and tokenizer
56
+ model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b-chat', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
57
+ tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b-chat', trust_remote_code=True)
58
+ model.tokenizer = tokenizer
59
+
60
+ query = 'Here are some frames of a video. Describe this video in detail'
61
+ image = ['./examples/liuxiang.mp4',]
62
+ with torch.autocast(device_type='cuda', dtype=torch.float16):
63
+ response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
64
+ print(response)
65
+ #The video opens with a shot of an athlete, dressed in a red and yellow uniform with the word "CHINA" emblazoned across the front, preparing for a race.
66
+ #The athlete, Liu Xiang, is seen in a crouched position, focused and ready, with the Olympic rings visible in the background, indicating the prestigious setting of the Olympic Games. As the race commences, the athletes are seen sprinting towards the hurdles, their determination evident in their powerful strides.
67
+ #The camera captures the intensity of the competition, with the athletes' numbers and times displayed on the screen, providing a real-time update on their performance. The race reaches a climax as Liu Xiang, still in his red and yellow uniform, triumphantly crosses the finish line, his arms raised in victory.
68
+ #The crowd in the stands erupts into cheers, their excitement palpable as they witness the athlete's success. The video concludes with a close-up shot of Liu Xiang, still basking in the glory of his victory, as the Olympic rings continue to symbolize the significance of the event.
69
+
70
+ query = 'tell me the athlete code of Liu Xiang'
71
+ image = ['./examples/liuxiang.mp4',]
72
+ with torch.autocast(device_type='cuda', dtype=torch.float16):
73
+ response, _ = model.chat(tokenizer, query, image, history=his, do_sample=False, num_beams=3, use_meta=True)
74
+ print(response)
75
+ #The athlete code of Liu Xiang, as displayed on his uniform in the video, is "1363".
76
+ ```
77
+
78
+ </details>
79
+
80
+ <details>
81
+ <summary>
82
+ <b>Multi-Image Mutli-Tune Dialog</b>
83
+ </summary>
84
+
85
+ ```python
86
+ import torch
87
+ from transformers import AutoModel, AutoTokenizer
88
+
89
+ torch.set_grad_enabled(False)
90
+
91
+ # init model and tokenizer
92
+ model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b-chat', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
93
+ tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b-chat', trust_remote_code=True)
94
+ model.tokenizer = tokenizer
95
+
96
+ query = 'Image1 <ImageHere>; Image2 <ImageHere>; Image3 <ImageHere>; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
97
+ image = ['./examples/cars1.jpg',
98
+ './examples/cars2.jpg',
99
+ './examples/cars3.jpg',]
100
+ with torch.autocast(device_type='cuda', dtype=torch.float16):
101
+ response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
102
+ print(response)
103
+ #To analyze the advantages and disadvantages of each car, we need to consider factors such as brand reputation, performance, design, cost, and maintenance.
104
+ #1. Mercedes-Benz: - Advantages: Known for its luxury and reliability, the Mercedes-Benz brand offers a high level of comfort, advanced technology, and superior craftsmanship. The vehicle in the image appears to be an SUV, which is versatile for both city driving and off-road conditions. - Disadvantages: Typically, Mercedes-Benz vehicles are more expensive compared to other brands, and they may require more frequent maintenance due to their luxury status.
105
+ #2. Bugatti: - Advantages: Bugatti is renowned for producing some of the fastest and most powerful cars in the world. The vehicle in the image is a sports car, likely offering an exhilarating driving experience with its high-performance engine and advanced aerodynamics. - Disadvantages: Bugatti cars are extremely expensive, making them less accessible to the average consumer. They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature.
106
+ #3. BMW: - Advantages: BMW is a well-respected manufacturer known for its innovative engineering and design. The car in the image appears to be an electric vehicle (EV), which offers advantages such as lower emissions, reduced noise pollution, and potentially lower operating costs. - Disadvantages: EVs like the one in the image may have a shorter range compared to traditional gasoline-powered vehicles, limiting their suitability for long-distance travel.
107
+ #Additionally, the initial cost of purchasing an EV can be higher than that of a conventional car. In conclusion, each car has its own set of advantages and disadvantages. The Mercedes-Benz offers luxury and versatility, the Bugatti provides high performance and exclusivity, and the BMW represents innovation in electric vehicle technology. The choice ultimately depends on the buyer's priorities, such as comfort, performance, environmental impact, and budget.
108
+
109
+ query = 'Image4 <ImageHere>; How about the car in Image4'
110
+ image.append('./examples/cars4.jpg')
111
+ with torch.autocast(device_type='cuda', dtype=torch.float16):
112
+ response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, history= his, use_meta=True)
113
+ print(response)
114
+ #The car in Image4 is a red sports car, which appears to be a Ferrari. Ferrari is a renowned Italian manufacturer known for producing some of the most iconic and high-performance sports cars in the world. - Advantages: Ferrari vehicles are synonymous with speed, luxury, and engineering excellence.
115
+ #The car in the image likely offers an exhilarating driving experience with its powerful engine, advanced aerodynamics, and high-quality craftsmanship. The red color adds to the car's aesthetic appeal, making it stand out on the road. - Disadvantages: Ferrari cars are extremely expensive, making them less accessible to the average consumer.
116
+ #They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. In conclusion, the Ferrari in Image4 represents a pinnacle of automotive engineering and design, offering unmatched performance and luxury.
117
+ #However, its high cost and specialized maintenance requirements make it less practical for everyday use compared to the other vehicles in the images.
118
+ ```
119
+
120
+
121
+ </details>
122
+
123
+ <details>
124
+ <summary>
125
+ <b>High Resolution Image Understanding</b>
126
+ </summary>
127
+
128
+ ```python
129
+ import torch
130
+ from transformers import AutoModel, AutoTokenizer
131
+
132
+ torch.set_grad_enabled(False)
133
+
134
+ # init model and tokenizer
135
+ model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b-chat', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
136
+ tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b-chat', trust_remote_code=True)
137
+ model.tokenizer = tokenizer
138
+
139
+ query = 'Analyze the given image in a detail manner'
140
+ image = ['./examples/dubai.png']
141
+ with torch.autocast(device_type='cuda', dtype=torch.float16):
142
+ response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
143
+ print(response)
144
+ #The infographic is a visual representation of various facts about Dubai. It begins with a statement about Palm Jumeirah, highlighting it as the largest artificial island visible from space. It then provides a historical context, noting that in 1968, there were only a few cars in Dubai, contrasting this with the current figure of more than 1.5 million vehicles.
145
+ #The infographic also points out that Dubai has the world's largest Gold Chain, with 7 of the top 10 tallest hotels located there. Additionally, it mentions that the crime rate is near 0%, and the income tax rate is also 0%, with 20% of the world's total cranes operating in Dubai. Furthermore, it states that 17% of the population is Emirati, and 83% are immigrants.
146
+ #The Dubai Mall is highlighted as the largest shopping mall in the world, with 1200 stores. The infographic also notes that Dubai has no standard address system, with no zip codes, area codes, or postal services. It mentions that the Burj Khalifa is so tall that its residents on top floors need to wait longer to break fast during Ramadan.
147
+ #The infographic also includes information about Dubai's climate-controlled City, with the Royal Suite at Burj Al Arab costing $24,000 per night. Lastly, it notes that the net worth of the four listed billionaires is roughly equal to the GDP of Honduras.
148
+
149
+ ```
150
+
151
+ </details>
152
+
153
+ ### Open Source License
154
+ The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].