框架模型层
实现了一个基于Sophgo芯片上的ResNet模型的推理流程,使用Sophon Sail库进行推理。
- json用于读取配置文件,numpy用于处理输入数据,sophon.sail用于处理Sophgo平台上的模型推理。sys.path.append 是为了动态地添加模块搜索路径,使得BaseModel可以被正确导入。
- 继承了BaseModel类,初始化时通过super().init('vision/classification/resnet')调用父类的构造函数。
- input_shape: 输入图像的形状 (1, 3, 256, 256),代表1张图片,3通道(RGB),大小为256x256像素。 model_path: 模型文件路径,这里指向的是ResNet模型的.bmodel文件。
- astype(np.float32) 将生成的数据类型转换为32位浮点数,这是模型输入常用的数据类型。
- sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO):加载模型文件到指定设备,并设置I/O模式为SYSIO,表示使用系统内存输入输出。获取模型的图名称 self.graph_name,在Sophgo的模型文件中可能包含多个计算图,通常只需要第一个。
- 读取配置文件config.json,并根据模型的标识符(self.model_identifier)获取模型的参数量和FLOPs(浮点运算次数)。
- 使用加载的模型进行推理,调用self.model.process(self.graph_name, self.input_data_dict),传入图名称和输入数据。 返回推理结果 output。
- 创建resnet_sophgo类的实例 resnet_model。 调用实例的方法,依次执行输入数据准备、模型加载、获取模型参数和FLOPs、以及最终的推理。
import json
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../../../../..')))
from model.model_set.model_base import BaseModel
import numpy as np
import sophon.sail as sail
class resnet_sophgo(BaseModel):
def __init__(self):
super().__init__('vision/classification/resnet')
self.devices = 0
self.input_shape = (1, 3, 256, 256)
self.model_path = '/home/aii-works/Benchmark_0822/model/bmodel/vision/classification/resnet/resnet_1684x_f32.bmodel'
def get_input(self):
self.image_input = np.random.randn(*self.input_shape).astype(np.float32)
def load_model(self):
self.model = sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO)
self.graph_name = self.model.get_graph_names()[0]
input_name_img = self.model.get_input_names(self.graph_name)
self.input_data_dict = {input_name_img [0]: self.image_input }
def get_params_flops(self) -> list:
'float [params, flops]'
with open('config.json', 'r') as file:
config = json.load(file)
model_info = config.get('model_info', {}).get(self.model_identifier, {})
params = model_info.get('Params(M)', 'Not available')
flops = model_info.get('FLOPs(G)', 'Not available')
return [params, flops]
def inference(self):
output = self.model.process(self.graph_name, self.input_data_dict)
return output
def main():
# Create an instance of the resnet_sophgo class
resnet_model = resnet_sophgo()
# Step 1: Prepare the input data
print("Preparing input data...")
resnet_model.get_input()
# Step 2: Load the model
print("Loading model...")
resnet_model.load_model()
# Step 3: Retrieve model parameters and FLOPs
print("Fetching model parameters and FLOPs...")
params_flops = resnet_model.get_params_flops()
print(f"Model Parameters (M): {params_flops[0]}")
print(f"Model FLOPs (G): {params_flops[1]}")
# Step 4: Perform inference
print("Running inference...")
output = resnet_model.inference()
print("Inference success")
if __name__ == "__main__":
main()
结果
Loading model...
open usercpu.so, init user_cpu_init
Fetching model parameters and FLOPs...
Model Parameters (M): 25.557032
Model FLOPs (G): 10.797092864
Running inference...
Inference success
实现了一个基于BERT模型的推理流程,主要在Sophgo芯片上执行,使用了PyTorch和Sophon Sail库。
- BertTokenizer和BertModel: 从transformers库中导入的BERT模型相关的类,用于处理文本和加载BERT模型。sophon.sail: 用于处理Sophgo平台上的模型推理。
- 调用父类的构造函数super().init('language/nlp/bert'),传入模型标识符。定义设备ID为0,指示使用的设备(通常是第一个设备)。设置模型文件的路径model_path和tokenizer的路径tokenizer_path。
- 首先定义待处理的文本self.text。设置最大序列长度为256。加载BERT的tokenizer,使用指定的tokenizer_path路径。将文本转化为模型所需的输入格式:return_tensors='pt'表示返回PyTorch张量格式。padding='max_length'表示填充到最大长度。truncation=True表示如果文本长度超过最大长度则进行截断。
- 使用Sophon Sail库的Engine类加载模型,指定模型路径和设备,设置I/O模式为SYSIO。获取模型的图名称self.graph_name,通常情况下,模型文件可能包含多个计算图,取第一个图。获取输入张量的名称input_name_img,并将输入数据(self.input_ids)存储在字典self.input_data_dict中,供后续推理使用。
- 调用加载的模型进行推理,使用self.model.process方法,传入图名称和输入数据。返回推理结果output。
import torch
import json
from model.model_set.model_base import BaseModel
from transformers import BertTokenizer, BertModel
import sophon.sail as sail
class bert_sophgo(BaseModel):
def __init__(self):
super().__init__('language/nlp/bert')
self.devices = 0
self.model_path = 'model/model_set/bmodel/language/nlp/bert/bert4torchf32.bmodel'
self.tokenizer_path = "model/model_set/pytorch/language/nlp/bert/vocab"
def get_input(self):
self.text = "Hello, how are you?"
self.max_length = 256
self.tokenizer = BertTokenizer.from_pretrained(self.tokenizer_path)
self.inputs = self.tokenizer(self.text, return_tensors='pt', padding='max_length',
truncation=True, max_length=self.max_length)
self.input_ids = self.inputs['input_ids'].to(dtype=torch.float32).numpy()
def load_model(self):
self.model = sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO)
self.graph_name = self.model.get_graph_names()[0]
input_name_img = self.model.get_input_names(self.graph_name)
self.input_data_dict = {input_name_img [0]: self.input_ids }
def get_params_flops(self) -> list:
'float [params, flops]'
with open('config.json', 'r') as file:
config = json.load(file)
model_info = config.get('model_info', {}).get(self.model_identifier, {})
params = model_info.get('Params(M)', 'Not available')
flops = model_info.get('FLOPs(G)', 'Not available')
return [params, flops]
def inference(self):
output = self.model.process(self.graph_name, self.input_data_dict)
return output
def main():
# Instantiate the model class
bert_model = bert_sophgo()
# Step 1: Get input
bert_model.get_input()
# Step 2: Load the model
bert_model.load_model()
print("Model loaded.")
# Step 3: Perform inference
output = bert_model.inference()
# Step 4: Get model parameters and FLOPs
params_flops = bert_model.get_params_flops()
print(f"Model Parameters (in millions): {params_flops[0]}")
print(f"Model FLOPs (in billions): {params_flops[1]}")
if __name__ == "__main__":
main()
结果
Model loaded.
Model Parameters (in millions): 109.48224
Model FLOPs (in billions): 43.52704512
实现了使用 Sophon SAIL来运行 CLIP(对比语言-图像预训练)模型。
- init 方法: 初始化类并调用父类构造函数,传入特定的模型类型标识符('multimodality/classification/clip')。
- self.text: 要编码和处理的文本标签列表。self.input_shape: 输入图像的形状(批量大小,通道数,高度,宽度),在这里是一个 1x3x224x224 的张量。self.text_net_batch_size: 文本网络的批处理大小,设置为 1。self.device: 判断当前是否有可用的 CUDA 设备,如果有则使用 GPU,否则使用 CPU。 self.image_model_path: 存储图像模型文件的路径。self.text_model_path: 存储文本模型文件的路径。
- self.image_input:生成一个与输入形状相同的随机浮点数组。self.text_input:对文本进行标记并编码,调用 encode_text 方法。
- sail.Engine:用于加载指定路径的模型。get_graph_names:获取模型的图名称。get_input_names:获取模型输入名称,并构建输入数据字典。
- 调用模型的 process 方法进行前向推理,并返回结果。
import torch
import json
import numpy as np
import sophon.sail as sail
from model.model_set.model_base import BaseModel
from model.model_set.models.multimodality.classification.clip.utils.simpletokenizer import tokenize_tpu
class clip_sophgo(BaseModel):
def __init__(self):
super().__init__('multimodality/classification/clip')
self.text = ["a diagram", "a dog", "a cat"]
self.input_shape =(1, 3, 224, 224)
self.text_net_batch_size = 1
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.image_model_path = 'model/model_set/bmodel/multimodality/classification/clip/clip_image_vitb32_bm1684x_f16.bmodel'
self.text_model_path = 'model/model_set/bmodel/multimodality/classification/clip/clip_text_vitb32_bm1684x_f16.bmodel'
def get_input(self):
self.image_input = np.random.randn(*self.input_shape).astype(np.float32)
self.text_input = self.encode_text(tokenize_tpu(self.text))
def load_model(self):
self.image_net = sail.Engine(self.image_model_path, 0, sail.IOMode.SYSIO)
self.text_net = sail.Engine(self.text_model_path, 0, sail.IOMode.SYSIO)
self.graph_name_img = self.image_net.get_graph_names()[0]
input_name_img = self.image_net.get_input_names(self.graph_name_img)
self.input_data_dict_img = {input_name_img [0]: self.image_input }
self.graph_name_text = self.text_net.get_graph_names()[0]
input_name_text = self.text_net.get_input_names(self.graph_name_text)
self.input_data_dict_text = {input_name_text [0]: self.text_input }
def encode_text(self, text):
text_batch = text.shape[0]
if text_batch > self.text_net_batch_size:
for start_idx in range(0, text_batch, self.text_net_batch_size):
end_idx = min(start_idx + self.text_net_batch_size, text_batch) # Ensure end_idx does not exceed text_batch
batch_slice = text[start_idx:end_idx]
if batch_slice.shape[0] < self.text_net_batch_size:
padding_size = self.text_net_batch_size - batch_slice.shape[0]
batch_slice = np.concatenate([batch_slice, np.zeros((padding_size, *batch_slice.shape[1:]), dtype=batch_slice.dtype)], axis=0)
return batch_slice
else:
return text
def get_params_flops(self) -> list:
'float [params, flops]'
with open('config.json', 'r') as file:
config = json.load(file)
model_info = config.get('model_info', {}).get(self.model_identifier, {})
params = model_info.get('Params(M)', 'Not available')
flops = model_info.get('FLOPs(G)', 'Not available')
return [params, flops]
def inference(self):
img_results = self.image_net.process(self.graph_name_img, self.input_data_dict_img)
txt_results = self.text_net.process(self.graph_name_text , self.input_data_dict_text)
return img_results, txt_results
def main():
# 创建CLIP模型的实例
clip_model = clip_sophgo()
print("Preparing input data...")
clip_model.get_input()
print("Loading models...")
clip_model.load_model()
print("Models loaded.")
print("Fetching model parameters and FLOPs...")
params_flops = clip_model.get_params_flops()
print(f"Model Parameters (in millions): {params_flops[0]}")
print(f"Model FLOPs (in billions): {params_flops[1]}")
print("Running inference...")
img_results, txt_results = clip_model.inference()
print("Inference success.")
# 输出图像和文本推理结果
print("Image results:", img_results)
print("Text results:", txt_results)
if __name__ == "__main__":
main()
结果
Models loaded.
Fetching model parameters and FLOPs...
Model Parameters (in millions): 151.277313
Model FLOPs (in billions): 17.520132096
Running inference...
Inference success.
Image results: {'output_MatMul_f32': array([[-1.69525146e-02, -6.65893555e-02, 2.46215820e-01,
5.56640625e-02, 7.07397461e-02, 1.19567871e-01,
-7.79418945e-02, 7.28027344e-01, 2.84912109e-01,
...,
9.34600830e-03, 7.61795044e-03, 2.84423828e-01,
-4.71923828e-01, 3.02001953e-01]], dtype=float32)}
Text results: {'output_LayerNormalization_f32': array([[[ 0.33911133, 0.11663818, 0.10198975, ..., 0.24694824,
0.5908203 , 0.10131836],
[ 1.9746094 , -0.58447266, 0.36865234, ..., 1.1679688 ,
0.8051758 , -0.9785156 ],
...,
[ 0.21704102, -0.34692383, -0.6845703 , ..., 0.5913086 ,
-0.08435059, -1.4951172 ],
[ 0.54345703, -0.23352051, -0.9902344 , ..., 0.09265137,
-0.04849243, -1.7587891 ]]], dtype=float32)}
展示了基于GPU或者TPU进行模型的推理,并且对模型的FLOPs(浮点运算次数)和参数数量进行统计。使用了ERNIE 3.0模型,并根据不同的硬件模式(GPU或TPU)执行推理,最后测量推理性能指标。
- os: 用于检查和操作文件路径。time: 用于测量推理的时间(计算延迟和FPS)。torch: PyTorch库,用于处理深度学习模型。requests: 用于下载模型权重文件。transformers.BertTokenizer 和 ErnieModel: 用于加载ERNIE 3.0模型和其对应的tokenizer。tpu_perf.infer.SGInfer: 用于TPU推理。thop.profile: 用于计算模型的FLOPs。numpy: 用于处理数组数据,特别是在TPU模式下。
- ernie3 类是一个封装ERNIE 3.0模型的类,支持GPU和TPU模式的推理,ode: 决定推理是在GPU上还是TPU上运行。可选值为gpu或tpu。text: 要进行推理的文本。max_length: 最大序列长度,用于tokenizer。model_path: 模型的权重路径。tokenizer_path: 用于存放tokenizer配置文件的路径。
- 加载tokenizer,并对输入文本进行编码(将文本转化为模型可以理解的输入格式)。如果模式是gpu:设备设置为CUDA(如果可用),否则为CPU。检查并下载模型权重,然后加载ERNIE 3.0模型到指定设备。将输入文本编码并转换为张量,准备在GPU上进行推理。如果模式是tpu:对文本进行编码,并将编码的输入转换为numpy数组,便于TPU处理。
- 如果是TPU模式:使用SGInfer类加载TPU上的BModel文件。多次执行推理(在该例中为100次),然后计算每次推理的延迟(单位:毫秒)和FPS(每秒帧数)。PU模式下:多次执行推理以测量每次推理的延迟和FPS,评估模型在TPU上的推理速度。
import os
import time
import torch
import requests
from transformers import BertTokenizer, ErnieModel
from tpu_perf.infer import SGInfer
from thop import profile
import numpy as np
def download_model_weights(model_path):
if not os.path.exists(os.path.join(model_path, 'pytorch_model.bin')):
print(f"权重文件不存在,正在从 Hugging Face 下载权重...")
model_url = "https://huggingface.co/nghuyong/ernie-3.0-medium-zh/resolve/main/pytorch_model.bin?download=true"
response = requests.get(model_url)
if response.status_code == 200:
with open(os.path.join(model_path, 'pytorch_model.bin'), 'wb') as f:
f.write(response.content)
print("权重下载完成。")
else:
print("权重下载失败,请检查网络连接或 URL。")
class ernie3:
def __init__(self, mode='gpu', text="Hello, how are you?", max_length=256, model_path='/home/aii-works/Benchmark_refactoring/model/model_set/pytorch/language/nlp/ernie3/vocab', tokenizer_path='/home/aii-works/Benchmark_refactoring/model/model_set/pytorch/language/nlp/ernie3/vocab'):
self.mode = mode
self.text = text
self.max_length = max_length
self.tokenizer_path = tokenizer_path
self.model_path = model_path
self.tokenizer = BertTokenizer.from_pretrained(tokenizer_path)
if mode == 'gpu':
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
download_model_weights(model_path)
self.model = ErnieModel.from_pretrained(model_path).to(self.device)
self.inputs = self.tokenizer(text=self.text, return_tensors='pt', padding='max_length', max_length=self.max_length).to(self.device)
elif mode == 'tpu':
self.inputs = self.tokenizer(text=text, return_tensors='pt', padding='max_length', max_length=max_length)
self.input_ids = self.inputs['input_ids'].numpy().astype(np.int32)
else:
raise ValueError("Mode should be either 'gpu' or 'tpu'")
def count_parameters_and_flops(self):
flops, _ = profile(self.model, (self.inputs.input_ids, self.inputs.attention_mask), verbose=False)
params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
return flops / 1e9 * 2, params / 1e6
def forward(self):
if self.mode == 'gpu':
outputs = self.model(**self.inputs)
return outputs
elif self.mode == 'tpu':
return self.input_ids
else:
raise ValueError("Mode should be either 'gpu' or 'tpu'")
if __name__ == '__main__':
mode = 'tpu' # Change to 'tpu' for TPU mode
model = ernie3(mode=mode)
if mode == 'gpu':
for _ in range(1):
with torch.no_grad():
outputs = model.forward()
flops, params = model.count_parameters_and_flops()
print(f"FLOPs: {flops} GFLOPs")
print(f"Parameters: {params} Million")
elif mode == 'tpu':
bmodel_path = "/home/aii-works/Benchmark_refactoring/model/model_set/bmodel/language/nlp/ernie3/ernie3_1684x_f32.bmodel"
net = SGInfer(bmodel_path, devices=[0])
input = model.forward()
iterations = 100
t_start = time.time()
for _ in range(iterations):
output = net.infer_one(input)
elapsed_time = time.time() - t_start
latency = elapsed_time / iterations * 1000
FPS = 1000 / latency
print(f"FPS: {FPS:.2f}")
print(f"Latency: {latency:.2f} ms")
结果
[tid=30a25000] INFO: USING DEVICES: 0
[tid=30a25000] INFO: init context on device 0
open usercpu.so, init user_cpu_init
[tid=30a25000] INFO: NetName: ernie3
[tid=30a25000] INFO: Input 0) 'input_ids' shape=[ 1 256 ] dtype=INT32 scale=1
[tid=30a25000] INFO: Output 0) 'output_LayerNormalization' shape=[ 1 256 768 ] dtype=FLOAT32 scale=1
[tid=30a25000] INFO: Output 1) '855_Tanh' shape=[ 1 768 ] dtype=FLOAT32 scale=1
FPS: 51.46
Latency: 19.43 ms
定义了一个基于 UNet 模型的图像分割类,完成模型的加载、输入生成、推理及参数获取等操作,适用于使用 Sophon AI 框架进行深度学习任务。
- unet_sophgo: 这个类继承自 BaseModel,表明它将具有 UNet 模型特定的附加功能。
- super().init('vision/segmentation/unet'): 调用基类的构造函数,并传入特定的标识符,这会设置一些通用的模型属性。self.devices: 初始化为 0,表示模型将使用特定的设备(例如 CPU 或 GPU)进行推理。self.input_shape: 定义输入张量的形状,这里表示一批次 1 张图像,包含 3 个颜色通道(RGB),尺寸为 640x640。self.model_path: 指向模型文件的路径,该文件是 UNet 架构的二进制模型文件(.bmodel)。
- self.model: 使用 sail.Engine 创建模型实例,加载指定路径的模型,并设置设备和 I/O 模式。self.graph_name: 获取模型图的名称。input_name_img: 获取输入节点的名称。self.input_data_dict: 创建一个字典,将输入图像张量映射到输入节点名称。
- get_params_flops: 该方法读取配置文件,获取模型参数(以百万计)和 FLOPs(每秒浮点运算次数,单位为十亿)。
- 调用 get_params_flops 方法获取并打印模型参数和 FLOPs。调用 inference 方法执行推理并打印输出结果。
import json
import numpy as np
from model.model_set.model_base import BaseModel
import sophon.sail as sail
class unet_sophgo(BaseModel):
def __init__(self):
super().__init__('vision/segmentation/unet')
self.devices = 0
self.input_shape = (1, 3, 640, 640)
self.model_path = 'model/model_set/bmodel/vision/segmentation/unet/unet_1684x_f32.bmodel'
def get_input(self):
self.image_input = np.random.randn(*self.input_shape).astype(np.float32)
def load_model(self):
self.model = sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO)
self.graph_name = self.model.get_graph_names()[0]
input_name_img = self.model.get_input_names(self.graph_name)
self.input_data_dict = {input_name_img [0]: self.image_input }
def get_params_flops(self) -> list:
'float [params, flops]'
with open('config.json', 'r') as file:
config = json.load(file)
model_info = config.get('model_info', {}).get(self.model_identifier, {})
params = model_info.get('Params(M)', 'Not available')
flops = model_info.get('FLOPs(G)', 'Not available')
return [params, flops]
def inference(self):
output = self.model.process(self.graph_name, self.input_data_dict)
return output
def main():
# 创建 UNet 类的实例
unet_model = unet_sophgo()
# 获取输入参数
unet_model.get_input()
# 加载模型
unet_model.load_model()
# 获取模型参数和 FLOPs
params_flops = unet_model.get_params_flops()
print(f"Model Parameters: {params_flops[0]}M, FLOPs: {params_flops[1]}G")
# 执行推理
output = unet_model.inference()
# 打印输出
print("Inference Output:", output)
if __name__ == "__main__":
main()
结果
Model Parameters: 31.032915M, FLOPs: 683.5666944G
Inference Output: {'output_Conv': array([[[[ 1.4099784 , 0.43080187, -0.13301468, ..., 1.1241736 ,
1.2472477 , 1.9289322 ],
[ 0.25994158, -0.73382187, -0.9940162 , ..., -0.780355 ,
-0.5162163 , 0.37734842],
....
[-3.3069534 , -3.0841465 , -2.9631705 , ..., -2.723567 ,
-2.7471972 , -2.958744 ]]]], dtype=float32)}
定义了一个使用 Stable Diffusion 模型生成图像的 Python 类 stablediffusionv1_5_sophgo,并通过 main 函数执行图像生成的过程。
- super().init('...'):调用父类的初始化方法,并传入一个参数。self.stage:设置生成模型的阶段(例如,可能是单一图像生成)。self.img_size:定义生成图像的大小为 512x512 像素。self.model_path 和 self.tokenizer:分别指定模型和分词器的路径。
- self.prompt:定义生成图像时使用的文本提示。self.scheduler:创建一个 PNDM 调度器实例,用于设置扩散模型的参数。
- 创建一个 StableDiffusionPipeline 实例,使用之前定义的调度器、模型路径、分词器等参数。
- 使用 self.pipeline 生成图像。参数包括生成的提示、图像的高度和宽度、负提示、强度、推理步数和引导比例等。
- 创建 stablediffusionv1_5_sophgo 的实例。调用 get_input 获取输入参数。调用 load_model 加载模型。调用 inference 方法执行推理并生成图像。将生成的图像保存为 "generated_image.png"。
import torch
import json
from model.model_set.model_base import BaseModel
from diffusers import PNDMScheduler
from model.model_set.models.multimodality.generative.stablediffusionv1_5.utils.stable_diffusion import StableDiffusionPipeline
class stablediffusionv1_5_sophgo(BaseModel):
def __init__(self):
super().__init__('multimodality/generative/stablediffusionv1_5')
self.stage = "singlize"
self.img_size = (512, 512)
self.model_path = "model/model_set/bmodel/multimodality/generative/stablediffusionv1_5"
self.tokenizer = "model/model_set/pytorch/multimodality/generative/stablediffusionv1_5/tokenizer_path"
def get_input(self):
self.prompt = "a photo of an astronaut riding a horse on mars"
self.scheduler = PNDMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
skip_prk_steps=True,
)
def load_model(self):
self.pipeline = StableDiffusionPipeline(
scheduler = self.scheduler,
model_path = self.model_path,
stage = self.stage,
tokenizer = self.tokenizer,
dev_id = 0,
controlnet_name = None,
processor_name = None,
)
def get_params_flops(self) -> list:
'float [params, flops]'
with open('config.json', 'r') as file:
config = json.load(file)
model_info = config.get('model_info', {}).get(self.model_identifier, {})
params = model_info.get('Params(M)', 'Not available')
flops = model_info.get('FLOPs(G)', 'Not available')
return [params, flops]
def inference(self):
image = self.pipeline(prompt = self.prompt,
height = self.img_size[0],
width = self.img_size[1],
negative_prompt = "worst quality",
init_image = None,
controlnet_img = None,
strength = 0.7,
num_inference_steps = 50,
guidance_scale = 7.5)
return image
def main():
# 创建 StableDiffusion 类的实例
stable_diffusion_model = stablediffusionv1_5_sophgo()
# 获取输入参数
stable_diffusion_model.get_input()
# 加载模型
stable_diffusion_model.load_model()
# 执行推理
generated_image = stable_diffusion_model.inference()
generated_image.save("generated_image.png")
if __name__ == "__main__":
main()
结果
2%|███ | 1/50 [00:00<00:22, 2.20it/s]Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.0270 ms
Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.1190 ms
Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.0000 ms
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.2740 ms
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.3570 ms
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.1850 ms
Function[inference]-[bmrt_launch_tensor_ex] time use: 223.5890 ms
Function[sync_d2s]-[bm_memcpy_d2s_partial] time use: 0.2480 ms
....
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:11<00:00, 4.31it/s]
Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.0140 ms
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.2310 ms
Function[inference]-[bmrt_launch_tensor_ex] time use: 489.2620 ms
Function[sync_d2s]-[bm_memcpy_d2s_partial] time use: 1.7380 ms
生成图片