框架模型层

实现了一个基于Sophgo芯片上的ResNet模型的推理流程,使用Sophon Sail库进行推理。

  • json用于读取配置文件,numpy用于处理输入数据,sophon.sail用于处理Sophgo平台上的模型推理。sys.path.append 是为了动态地添加模块搜索路径,使得BaseModel可以被正确导入。
  • 继承了BaseModel类,初始化时通过super().init('vision/classification/resnet')调用父类的构造函数。
  • input_shape: 输入图像的形状 (1, 3, 256, 256),代表1张图片,3通道(RGB),大小为256x256像素。 model_path: 模型文件路径,这里指向的是ResNet模型的.bmodel文件。
  • astype(np.float32) 将生成的数据类型转换为32位浮点数,这是模型输入常用的数据类型。
  • sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO):加载模型文件到指定设备,并设置I/O模式为SYSIO,表示使用系统内存输入输出。获取模型的图名称 self.graph_name,在Sophgo的模型文件中可能包含多个计算图,通常只需要第一个。
  • 读取配置文件config.json,并根据模型的标识符(self.model_identifier)获取模型的参数量和FLOPs(浮点运算次数)。
  • 使用加载的模型进行推理,调用self.model.process(self.graph_name, self.input_data_dict),传入图名称和输入数据。 返回推理结果 output。
  • 创建resnet_sophgo类的实例 resnet_model。 调用实例的方法,依次执行输入数据准备、模型加载、获取模型参数和FLOPs、以及最终的推理。
import json
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../../../../..')))
from model.model_set.model_base import BaseModel

import numpy as np
import sophon.sail as sail

class resnet_sophgo(BaseModel):
    def __init__(self):
        super().__init__('vision/classification/resnet')

        self.devices = 0 
        self.input_shape = (1, 3, 256, 256)
        self.model_path = '/home/aii-works/Benchmark_0822/model/bmodel/vision/classification/resnet/resnet_1684x_f32.bmodel'

    def get_input(self):
        self.image_input = np.random.randn(*self.input_shape).astype(np.float32)

    def load_model(self):
        self.model = sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO)
        self.graph_name = self.model.get_graph_names()[0]
        input_name_img  = self.model.get_input_names(self.graph_name)
        self.input_data_dict  = {input_name_img [0]: self.image_input }

    def get_params_flops(self) -> list:
        'float [params, flops]'

        with open('config.json', 'r') as file:
            config = json.load(file)
            model_info = config.get('model_info', {}).get(self.model_identifier, {})
            params = model_info.get('Params(M)', 'Not available')
            flops = model_info.get('FLOPs(G)', 'Not available')
        return [params, flops]

    def inference(self):
        output = self.model.process(self.graph_name, self.input_data_dict)
        return output

def main():
    # Create an instance of the resnet_sophgo class
    resnet_model = resnet_sophgo()
    
    # Step 1: Prepare the input data
    print("Preparing input data...")
    resnet_model.get_input()
    
    # Step 2: Load the model
    print("Loading model...")
    resnet_model.load_model()
    
    # Step 3: Retrieve model parameters and FLOPs
    print("Fetching model parameters and FLOPs...")
    params_flops = resnet_model.get_params_flops()
    print(f"Model Parameters (M): {params_flops[0]}")
    print(f"Model FLOPs (G): {params_flops[1]}")
    
    # Step 4: Perform inference
    print("Running inference...")
    output = resnet_model.inference()
    print("Inference success")

if __name__ == "__main__":
    main()

结果

Loading model...
open usercpu.so, init user_cpu_init 
Fetching model parameters and FLOPs...
Model Parameters (M): 25.557032
Model FLOPs (G): 10.797092864
Running inference...
Inference success

实现了一个基于BERT模型的推理流程,主要在Sophgo芯片上执行,使用了PyTorch和Sophon Sail库。

  • BertTokenizer和BertModel: 从transformers库中导入的BERT模型相关的类,用于处理文本和加载BERT模型。sophon.sail: 用于处理Sophgo平台上的模型推理。
  • 调用父类的构造函数super().init('language/nlp/bert'),传入模型标识符。定义设备ID为0,指示使用的设备(通常是第一个设备)。设置模型文件的路径model_path和tokenizer的路径tokenizer_path。
  • 首先定义待处理的文本self.text。设置最大序列长度为256。加载BERT的tokenizer,使用指定的tokenizer_path路径。将文本转化为模型所需的输入格式:return_tensors='pt'表示返回PyTorch张量格式。padding='max_length'表示填充到最大长度。truncation=True表示如果文本长度超过最大长度则进行截断。
  • 使用Sophon Sail库的Engine类加载模型,指定模型路径和设备,设置I/O模式为SYSIO。获取模型的图名称self.graph_name,通常情况下,模型文件可能包含多个计算图,取第一个图。获取输入张量的名称input_name_img,并将输入数据(self.input_ids)存储在字典self.input_data_dict中,供后续推理使用。
  • 调用加载的模型进行推理,使用self.model.process方法,传入图名称和输入数据。返回推理结果output。
import torch
import json
from model.model_set.model_base import BaseModel
from transformers import BertTokenizer, BertModel
import sophon.sail as sail


class bert_sophgo(BaseModel):
    def __init__(self):
        super().__init__('language/nlp/bert')

        self.devices = 0
        self.model_path = 'model/model_set/bmodel/language/nlp/bert/bert4torchf32.bmodel'     
        self.tokenizer_path = "model/model_set/pytorch/language/nlp/bert/vocab"

        
    def get_input(self):
        self.text = "Hello, how are you?"
        self.max_length = 256
        self.tokenizer = BertTokenizer.from_pretrained(self.tokenizer_path)
        self.inputs = self.tokenizer(self.text, return_tensors='pt', padding='max_length', 
                                     truncation=True, max_length=self.max_length)
        self.input_ids = self.inputs['input_ids'].to(dtype=torch.float32).numpy()

    def load_model(self):
        self.model = sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO)
        self.graph_name = self.model.get_graph_names()[0]
        input_name_img  = self.model.get_input_names(self.graph_name)
        self.input_data_dict  = {input_name_img [0]: self.input_ids }
         
               
    def get_params_flops(self) -> list:
        'float [params, flops]'

        with open('config.json', 'r') as file:
            config = json.load(file)
            model_info = config.get('model_info', {}).get(self.model_identifier, {})
            params = model_info.get('Params(M)', 'Not available')
            flops = model_info.get('FLOPs(G)', 'Not available')
        return [params, flops]


    def inference(self):
        output = self.model.process(self.graph_name, self.input_data_dict)
        return output

def main():
    # Instantiate the model class
    bert_model = bert_sophgo()

    # Step 1: Get input
    bert_model.get_input()

    # Step 2: Load the model
    bert_model.load_model()
    print("Model loaded.")

    # Step 3: Perform inference
    output = bert_model.inference()
    # Step 4: Get model parameters and FLOPs
    params_flops = bert_model.get_params_flops()
    print(f"Model Parameters (in millions): {params_flops[0]}")
    print(f"Model FLOPs (in billions): {params_flops[1]}")

if __name__ == "__main__":
    main()

结果

Model loaded.
Model Parameters (in millions): 109.48224
Model FLOPs (in billions): 43.52704512

实现了使用 Sophon SAIL来运行 CLIP(对比语言-图像预训练)模型。

  • init 方法: 初始化类并调用父类构造函数,传入特定的模型类型标识符('multimodality/classification/clip')。
  • self.text: 要编码和处理的文本标签列表。self.input_shape: 输入图像的形状(批量大小,通道数,高度,宽度),在这里是一个 1x3x224x224 的张量。self.text_net_batch_size: 文本网络的批处理大小,设置为 1。self.device: 判断当前是否有可用的 CUDA 设备,如果有则使用 GPU,否则使用 CPU。 self.image_model_path: 存储图像模型文件的路径。self.text_model_path: 存储文本模型文件的路径。
  • self.image_input:生成一个与输入形状相同的随机浮点数组。self.text_input:对文本进行标记并编码,调用 encode_text 方法。
  • sail.Engine:用于加载指定路径的模型。get_graph_names:获取模型的图名称。get_input_names:获取模型输入名称,并构建输入数据字典。
  • 调用模型的 process 方法进行前向推理,并返回结果。
import torch
import json
import numpy as np
import sophon.sail as sail
from model.model_set.model_base import BaseModel
from model.model_set.models.multimodality.classification.clip.utils.simpletokenizer import tokenize_tpu

class clip_sophgo(BaseModel):
    def __init__(self):
        super().__init__('multimodality/classification/clip')

        self.text = ["a diagram", "a dog", "a cat"]
        self.input_shape =(1, 3, 224, 224)
        self.text_net_batch_size = 1
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.image_model_path = 'model/model_set/bmodel/multimodality/classification/clip/clip_image_vitb32_bm1684x_f16.bmodel'
        self.text_model_path = 'model/model_set/bmodel/multimodality/classification/clip/clip_text_vitb32_bm1684x_f16.bmodel'

    def get_input(self):
        self.image_input = np.random.randn(*self.input_shape).astype(np.float32)
        self.text_input = self.encode_text(tokenize_tpu(self.text))

    def load_model(self):
        self.image_net = sail.Engine(self.image_model_path, 0, sail.IOMode.SYSIO)
        self.text_net = sail.Engine(self.text_model_path, 0, sail.IOMode.SYSIO)
        self.graph_name_img = self.image_net.get_graph_names()[0]
        input_name_img  = self.image_net.get_input_names(self.graph_name_img)
        self.input_data_dict_img  = {input_name_img [0]: self.image_input }
        self.graph_name_text = self.text_net.get_graph_names()[0]
        input_name_text  = self.text_net.get_input_names(self.graph_name_text)
        self.input_data_dict_text  = {input_name_text [0]: self.text_input }

    def encode_text(self, text):
        text_batch = text.shape[0]
        if text_batch > self.text_net_batch_size:
            for start_idx in range(0, text_batch, self.text_net_batch_size):
                end_idx = min(start_idx + self.text_net_batch_size, text_batch)  # Ensure end_idx does not exceed text_batch
                batch_slice = text[start_idx:end_idx]
                if batch_slice.shape[0] < self.text_net_batch_size:
                    padding_size = self.text_net_batch_size - batch_slice.shape[0]
                    batch_slice = np.concatenate([batch_slice, np.zeros((padding_size, *batch_slice.shape[1:]), dtype=batch_slice.dtype)], axis=0)
            return batch_slice
        else:
            return text
        
    def get_params_flops(self) -> list:
        'float [params, flops]'

        with open('config.json', 'r') as file:
            config = json.load(file)
            model_info = config.get('model_info', {}).get(self.model_identifier, {})
            params = model_info.get('Params(M)', 'Not available')
            flops = model_info.get('FLOPs(G)', 'Not available')
        return [params, flops]

    def inference(self):
        img_results = self.image_net.process(self.graph_name_img, self.input_data_dict_img)
        txt_results = self.text_net.process(self.graph_name_text , self.input_data_dict_text)
        return img_results, txt_results
    
def main():
    # 创建CLIP模型的实例
    clip_model = clip_sophgo()

    print("Preparing input data...")
    clip_model.get_input()

    print("Loading models...")
    clip_model.load_model()
    print("Models loaded.")

    print("Fetching model parameters and FLOPs...")
    params_flops = clip_model.get_params_flops()
    print(f"Model Parameters (in millions): {params_flops[0]}")
    print(f"Model FLOPs (in billions): {params_flops[1]}")

    print("Running inference...")
    img_results, txt_results = clip_model.inference()
    print("Inference success.")

    # 输出图像和文本推理结果
    print("Image results:", img_results)
    print("Text results:", txt_results)

if __name__ == "__main__":
    main()

结果

Models loaded.
Fetching model parameters and FLOPs...
Model Parameters (in millions): 151.277313
Model FLOPs (in billions): 17.520132096
Running inference...
Inference success.
Image results: {'output_MatMul_f32': array([[-1.69525146e-02, -6.65893555e-02,  2.46215820e-01,
         5.56640625e-02,  7.07397461e-02,  1.19567871e-01,
        -7.79418945e-02,  7.28027344e-01,  2.84912109e-01,
        ...,
         9.34600830e-03,  7.61795044e-03,  2.84423828e-01,
        -4.71923828e-01,  3.02001953e-01]], dtype=float32)}
Text results: {'output_LayerNormalization_f32': array([[[ 0.33911133,  0.11663818,  0.10198975, ...,  0.24694824,
          0.5908203 ,  0.10131836],
        [ 1.9746094 , -0.58447266,  0.36865234, ...,  1.1679688 ,
          0.8051758 , -0.9785156 ],
        ...,
        [ 0.21704102, -0.34692383, -0.6845703 , ...,  0.5913086 ,
         -0.08435059, -1.4951172 ],
        [ 0.54345703, -0.23352051, -0.9902344 , ...,  0.09265137,
         -0.04849243, -1.7587891 ]]], dtype=float32)}

展示了基于GPU或者TPU进行模型的推理,并且对模型的FLOPs(浮点运算次数)和参数数量进行统计。使用了ERNIE 3.0模型,并根据不同的硬件模式(GPU或TPU)执行推理,最后测量推理性能指标。

  • os: 用于检查和操作文件路径。time: 用于测量推理的时间(计算延迟和FPS)。torch: PyTorch库,用于处理深度学习模型。requests: 用于下载模型权重文件。transformers.BertTokenizer 和 ErnieModel: 用于加载ERNIE 3.0模型和其对应的tokenizer。tpu_perf.infer.SGInfer: 用于TPU推理。thop.profile: 用于计算模型的FLOPs。numpy: 用于处理数组数据,特别是在TPU模式下。
  • ernie3 类是一个封装ERNIE 3.0模型的类,支持GPU和TPU模式的推理,ode: 决定推理是在GPU上还是TPU上运行。可选值为gpu或tpu。text: 要进行推理的文本。max_length: 最大序列长度,用于tokenizer。model_path: 模型的权重路径。tokenizer_path: 用于存放tokenizer配置文件的路径。
  • 加载tokenizer,并对输入文本进行编码(将文本转化为模型可以理解的输入格式)。如果模式是gpu:设备设置为CUDA(如果可用),否则为CPU。检查并下载模型权重,然后加载ERNIE 3.0模型到指定设备。将输入文本编码并转换为张量,准备在GPU上进行推理。如果模式是tpu:对文本进行编码,并将编码的输入转换为numpy数组,便于TPU处理。
  • 如果是TPU模式:使用SGInfer类加载TPU上的BModel文件。多次执行推理(在该例中为100次),然后计算每次推理的延迟(单位:毫秒)和FPS(每秒帧数)。PU模式下:多次执行推理以测量每次推理的延迟和FPS,评估模型在TPU上的推理速度。
import os
import time
import torch
import requests
from transformers import BertTokenizer, ErnieModel
from tpu_perf.infer import SGInfer
from thop import profile
import numpy as np
def download_model_weights(model_path):
    if not os.path.exists(os.path.join(model_path, 'pytorch_model.bin')):
        print(f"权重文件不存在,正在从 Hugging Face 下载权重...")
        model_url = "https://huggingface.co/nghuyong/ernie-3.0-medium-zh/resolve/main/pytorch_model.bin?download=true"
        response = requests.get(model_url)
        if response.status_code == 200:
            with open(os.path.join(model_path, 'pytorch_model.bin'), 'wb') as f:
                f.write(response.content)
            print("权重下载完成。")
        else:
            print("权重下载失败,请检查网络连接或 URL。")

class ernie3:
    def __init__(self, mode='gpu', text="Hello, how are you?", max_length=256, model_path='/home/aii-works/Benchmark_refactoring/model/model_set/pytorch/language/nlp/ernie3/vocab', tokenizer_path='/home/aii-works/Benchmark_refactoring/model/model_set/pytorch/language/nlp/ernie3/vocab'):
        self.mode = mode
        self.text = text
        self.max_length = max_length
        self.tokenizer_path = tokenizer_path
        self.model_path = model_path
        self.tokenizer = BertTokenizer.from_pretrained(tokenizer_path)
        
        if mode == 'gpu':
            self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
            download_model_weights(model_path)
            self.model = ErnieModel.from_pretrained(model_path).to(self.device)
            self.inputs = self.tokenizer(text=self.text, return_tensors='pt', padding='max_length', max_length=self.max_length).to(self.device)
        elif mode == 'tpu':
            self.inputs = self.tokenizer(text=text, return_tensors='pt', padding='max_length', max_length=max_length)
            self.input_ids = self.inputs['input_ids'].numpy().astype(np.int32)
    
        else:
            raise ValueError("Mode should be either 'gpu' or 'tpu'")

    def count_parameters_and_flops(self):
        flops, _ = profile(self.model, (self.inputs.input_ids, self.inputs.attention_mask), verbose=False)
        params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
        return flops / 1e9 * 2,  params / 1e6

    def forward(self):
        if self.mode == 'gpu':
            outputs = self.model(**self.inputs)
            return outputs
        elif self.mode == 'tpu':
            return self.input_ids
        else:
            raise ValueError("Mode should be either 'gpu' or 'tpu'")

if __name__ == '__main__':
    mode = 'tpu'  # Change to 'tpu' for TPU mode

    model = ernie3(mode=mode)
    
    if mode == 'gpu':
        for _ in range(1):
            with torch.no_grad():
                outputs = model.forward()
        flops, params = model.count_parameters_and_flops()
        print(f"FLOPs: {flops} GFLOPs")
        print(f"Parameters: {params} Million")
    elif mode == 'tpu':
        bmodel_path = "/home/aii-works/Benchmark_refactoring/model/model_set/bmodel/language/nlp/ernie3/ernie3_1684x_f32.bmodel"
        net = SGInfer(bmodel_path, devices=[0])
        input = model.forward()
        iterations = 100
        t_start = time.time()
        for _ in range(iterations):
            output = net.infer_one(input)
        elapsed_time = time.time() - t_start
        latency = elapsed_time / iterations * 1000
        FPS = 1000 / latency
        print(f"FPS: {FPS:.2f}")
        print(f"Latency: {latency:.2f} ms")

结果

[tid=30a25000] INFO: USING DEVICES: 0 
[tid=30a25000] INFO: init context on device 0
open usercpu.so, init user_cpu_init 
[tid=30a25000] INFO: NetName: ernie3
[tid=30a25000] INFO:   Input 0) 'input_ids' shape=[ 1 256 ] dtype=INT32 scale=1
[tid=30a25000] INFO:   Output 0) 'output_LayerNormalization' shape=[ 1 256 768 ] dtype=FLOAT32 scale=1
[tid=30a25000] INFO:   Output 1) '855_Tanh' shape=[ 1 768 ] dtype=FLOAT32 scale=1
FPS: 51.46
Latency: 19.43 ms

定义了一个基于 UNet 模型的图像分割类,完成模型的加载、输入生成、推理及参数获取等操作,适用于使用 Sophon AI 框架进行深度学习任务。

  • unet_sophgo: 这个类继承自 BaseModel,表明它将具有 UNet 模型特定的附加功能。
  • super().init('vision/segmentation/unet'): 调用基类的构造函数,并传入特定的标识符,这会设置一些通用的模型属性。self.devices: 初始化为 0,表示模型将使用特定的设备(例如 CPU 或 GPU)进行推理。self.input_shape: 定义输入张量的形状,这里表示一批次 1 张图像,包含 3 个颜色通道(RGB),尺寸为 640x640。self.model_path: 指向模型文件的路径,该文件是 UNet 架构的二进制模型文件(.bmodel)。
  • self.model: 使用 sail.Engine 创建模型实例,加载指定路径的模型,并设置设备和 I/O 模式。self.graph_name: 获取模型图的名称。input_name_img: 获取输入节点的名称。self.input_data_dict: 创建一个字典,将输入图像张量映射到输入节点名称。
  • get_params_flops: 该方法读取配置文件,获取模型参数(以百万计)和 FLOPs(每秒浮点运算次数,单位为十亿)。
  • 调用 get_params_flops 方法获取并打印模型参数和 FLOPs。调用 inference 方法执行推理并打印输出结果。
import json
import numpy as np
from model.model_set.model_base import BaseModel
import sophon.sail as sail

class unet_sophgo(BaseModel):
    def __init__(self):
        super().__init__('vision/segmentation/unet')

        self.devices = 0 
        self.input_shape = (1, 3, 640, 640)
        self.model_path = 'model/model_set/bmodel/vision/segmentation/unet/unet_1684x_f32.bmodel'

    def get_input(self):
        self.image_input = np.random.randn(*self.input_shape).astype(np.float32)

    def load_model(self):
        self.model = sail.Engine(self.model_path, self.devices, sail.IOMode.SYSIO)
        self.graph_name = self.model.get_graph_names()[0]
        input_name_img  = self.model.get_input_names(self.graph_name)
        self.input_data_dict  = {input_name_img [0]: self.image_input }

    def get_params_flops(self) -> list:
        'float [params, flops]'

        with open('config.json', 'r') as file:
            config = json.load(file)
            model_info = config.get('model_info', {}).get(self.model_identifier, {})
            params = model_info.get('Params(M)', 'Not available')
            flops = model_info.get('FLOPs(G)', 'Not available')
        return [params, flops]

    def inference(self):
        output = self.model.process(self.graph_name, self.input_data_dict)
        return output
def main():
    # 创建 UNet 类的实例
    unet_model = unet_sophgo()
    
    # 获取输入参数
    unet_model.get_input()
    
    # 加载模型
    unet_model.load_model()
    
    # 获取模型参数和 FLOPs
    params_flops = unet_model.get_params_flops()
    print(f"Model Parameters: {params_flops[0]}M, FLOPs: {params_flops[1]}G")
    
    # 执行推理
    output = unet_model.inference()
    
    # 打印输出
    print("Inference Output:", output)

if __name__ == "__main__":
    main()

结果

Model Parameters: 31.032915M, FLOPs: 683.5666944G
Inference Output: {'output_Conv': array([[[[ 1.4099784 ,  0.43080187, -0.13301468, ...,  1.1241736 ,
           1.2472477 ,  1.9289322 ],
         [ 0.25994158, -0.73382187, -0.9940162 , ..., -0.780355  ,
          -0.5162163 ,  0.37734842],
....
         [-3.3069534 , -3.0841465 , -2.9631705 , ..., -2.723567  ,
          -2.7471972 , -2.958744  ]]]], dtype=float32)}

定义了一个使用 Stable Diffusion 模型生成图像的 Python 类 stablediffusionv1_5_sophgo,并通过 main 函数执行图像生成的过程。

  • super().init('...'):调用父类的初始化方法,并传入一个参数。self.stage:设置生成模型的阶段(例如,可能是单一图像生成)。self.img_size:定义生成图像的大小为 512x512 像素。self.model_path 和 self.tokenizer:分别指定模型和分词器的路径。
  • self.prompt:定义生成图像时使用的文本提示。self.scheduler:创建一个 PNDM 调度器实例,用于设置扩散模型的参数。
  • 创建一个 StableDiffusionPipeline 实例,使用之前定义的调度器、模型路径、分词器等参数。
  • 使用 self.pipeline 生成图像。参数包括生成的提示、图像的高度和宽度、负提示、强度、推理步数和引导比例等。
  • 创建 stablediffusionv1_5_sophgo 的实例。调用 get_input 获取输入参数。调用 load_model 加载模型。调用 inference 方法执行推理并生成图像。将生成的图像保存为 "generated_image.png"。
import torch
import json
from model.model_set.model_base import BaseModel
from diffusers import PNDMScheduler
from model.model_set.models.multimodality.generative.stablediffusionv1_5.utils.stable_diffusion import StableDiffusionPipeline

class stablediffusionv1_5_sophgo(BaseModel):
    def __init__(self):
        super().__init__('multimodality/generative/stablediffusionv1_5')

        self.stage = "singlize"
        self.img_size = (512, 512)
        self.model_path = "model/model_set/bmodel/multimodality/generative/stablediffusionv1_5"
        self.tokenizer = "model/model_set/pytorch/multimodality/generative/stablediffusionv1_5/tokenizer_path"

    def get_input(self):
        self.prompt = "a photo of an astronaut riding a horse on mars"

        self.scheduler = PNDMScheduler(
                beta_start=0.00085,
                beta_end=0.012,
                beta_schedule="scaled_linear",
                skip_prk_steps=True,
            )

    def load_model(self):
        self.pipeline = StableDiffusionPipeline(
                scheduler = self.scheduler,
                model_path = self.model_path,
                stage = self.stage,
                tokenizer = self.tokenizer,
                dev_id = 0,
                controlnet_name = None,
                processor_name = None,
            ) 

    def get_params_flops(self) -> list:
        'float [params, flops]'

        with open('config.json', 'r') as file:
            config = json.load(file)
            model_info = config.get('model_info', {}).get(self.model_identifier, {})
            params = model_info.get('Params(M)', 'Not available')
            flops = model_info.get('FLOPs(G)', 'Not available')
        return [params, flops]

    def inference(self):
        image = self.pipeline(prompt = self.prompt,
        height = self.img_size[0],
        width = self.img_size[1],
        negative_prompt = "worst quality",
        init_image = None,
        controlnet_img = None,
        strength = 0.7,
        num_inference_steps = 50,
        guidance_scale = 7.5)
        return image

def main():
    # 创建 StableDiffusion 类的实例
    stable_diffusion_model = stablediffusionv1_5_sophgo()
    
    # 获取输入参数
    stable_diffusion_model.get_input()
    
    # 加载模型
    stable_diffusion_model.load_model()
    
    # 执行推理
    generated_image = stable_diffusion_model.inference()
    
    generated_image.save("generated_image.png")  

if __name__ == "__main__":
    main()

结果


  2%|███                                                                                                                                                      | 1/50 [00:00<00:22,  2.20it/s]Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.0270 ms 
Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.1190 ms 
Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.0000 ms 
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.2740 ms 
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.3570 ms 
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.1850 ms 
Function[inference]-[bmrt_launch_tensor_ex] time use: 223.5890 ms 
Function[sync_d2s]-[bm_memcpy_d2s_partial] time use: 0.2480 ms
....
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:11<00:00,  4.31it/s]
Function[reset_sys_data]-[memcpy_cpu_to_cpu_0] time use: 0.0140 ms 
Function[sync_s2d]-[bm_memcpy_s2d_partial] time use: 0.2310 ms 
Function[inference]-[bmrt_launch_tensor_ex] time use: 489.2620 ms 
Function[sync_d2s]-[bm_memcpy_d2s_partial] time use: 1.7380 ms 

生成图片

alt text