MD2JSON-T5-V1: Text-to-JSON Converter with T5

This model utilizes the T5 (Text-to-Text Transfer Transformer) architecture to convert text strings into valid JSON objects. It is designed to take structured text and transform it into a JSON object.

Description

The MD2JSON-T5-V1 model is trained to interpret text strings where keys and values are separated by a colon (e.g., #firstname: John), and then convert them into a valid JSON object. This model can be used for a wide range of tasks where converting text to JSON is required.

Example Input:

  • Input:

    #firstname: John
    #lastname: Doe
    #age: 30
    #married: true
    #hobbies: ["gaming", "running"]
    #address: {"city": "Berlin", "zipcode": 10115}
    #url: "https://example.com"
    
  • Generated JSON Output:

    {
        "firstname": "John",
        "lastname": "Doe",
        "age": 30,
        "married": true,
        "hobbies": ["gaming", "running"],
        "address": {
            "city": "Berlin",
            "zipcode": 10115
        },
        "url": "https://example.com"
    }
    

Another Example:

  • Input:

    #name: Charlie
    #age: 29
    #isStudent: true
    #skills: ["Java", "Machine Learning"]
    #profile: {"github": "charlie29", "linkedin": "charlie-linkedin"}
    #height: 172.3
    
  • Generated JSON Output:

    {
        "name": "Charlie",
        "age": 29,
        "isStudent": true,
        "skills": ["Java", "Machine Learning"],
        "profile": {
            "github": "charlie29",
            "linkedin": "charlie-linkedin"
        },
        "height": 172.3
    }
    

Load the Model

To use the model and perform inference, follow the steps below:

Install Dependencies

pip install torch transformers datasets

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import json

# Load the tokenizer and model
model_name = "yahyakhoder/MD2JSON-T5-V1"  # Replace with your Hugging Face model path
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Example Input
input_text = """#firstname: John
#lastname: Doe
#age: 30
#married: true
#hobbies: ["gaming", "running"]
#address: {"city": "Berlin", "zipcode": 10115}
#url: "https://example.com" """

# Tokenize and generate the output
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True, max_length=256)
outputs = model.generate(**inputs, max_length=256, num_beams=4, early_stopping=True)

# Decode and convert to JSON
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
try:
    output_json = json.loads(result)
    print(json.dumps(output_json, indent=2, ensure_ascii=False))
except json.JSONDecodeError:
    print("Error during JSON conversion")



### Summary of Changes:

1. The **YAML metadata** section at the beginning of the file includes:
   - **license**: `apache-2.0`
   - **tags**: Relevant keywords like `text-to-json`, `t5`, `seq2seq`, `json-conversion`, etc.
   - **base_model**: `t5-small`
   - **model_name**: `MD2JSON-T5-V1`
   - **version**: `V1`
   - **author**: `yahyakhoder`

2. **Model path** in the code (under `model_name` variable) is updated to `yahyakhoder/MD2JSON-T5-V1` to reflect your Hugging Face username and model name.

This should resolve the YAML metadata warning and provide all the necessary information for users accessing your model on Hugging Face.
Downloads last month
6
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for yahyakhoder/MD2JSON-T5-small-V1

Base model

google-t5/t5-small
Finetuned
(1642)
this model