BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Hierarchy of topics:

Usage

To use this model, please install BERTopic:

pip install -U -q bertopic safetensors

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("pszemraj/BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary")

topic_model.visualize_topics()

# for dataframe:
# topic_model.get_topic_info()

predicting new instances:

topic, embedding = topic_model.transform(text)
print(topic)

Topic overview

Number of topics: 24
Number of training documents: 1960

Click here for an overview of all topics.

Topic ID	Topic Keywords	Topic Frequency	Label
-1	no_saic_raw_sp - sep_4 - sec - data - image	13	-1_no_saic_raw_sp_sep_4_sec_data
0	lecture - applications - methods - learning - topics	104	0_lecture_applications_methods_learning
1	cogvideo - videos - cogview2 - cog - video	303	1_cogvideo_videos_cogview2_cog
2	ship - rainsford - hunted - island - hunts	117	2_ship_rainsford_hunted_island
3	films - dissertation - film - noir - identity	106	3_films_dissertation_film_noir
4	linguistics - language - languages - foundational - systems	104	4_linguistics_language_languages_foundational
5	nemo - dory - transcript - clownfish - fish	103	5_nemo_dory_transcript_clownfish
6	train - bruno - washington - station - tennis	102	6_train_bruno_washington_station
7	images - representations - image - captions - representation	102	7_images_representations_image_captions
8	merge - merging - explain - concept - problems	102	8_merge_merging_explain_concept
9	enhancement - enhancing - recordings - improve - waveforms	100	9_enhancement_enhancing_recordings_improve
10	arendelle - elsa - frozen - kristoff - olaf	99	10_arendelle_elsa_frozen_kristoff
11	scene - story - script - movie - gillis	97	11_scene_story_script_movie
12	lecture - lemmatization - nlp - medical - techniques	96	12_lecture_lemmatization_nlp_medical
13	questions - topics - conversation - terrance - talk	85	13_questions_topics_conversation_terrance
14	sniper - kill - fury - combat - narrator	81	14_sniper_kill_fury_combat
15	images - lecture - ezurich - pathology - medical	67	15_images_lecture_ezurich_pathology
16	timeseries - framework - interpretability - representations - next_concept	37	16_timeseries_framework_interpretability_representations
17	prediction - predictions - forecasting - predict - markov	27	17_prediction_predictions_forecasting_predict
18	images - imaging - computational - convolutional - lecture	27	18_images_imaging_computational_convolutional
19	technology - treatment - methods - medical - detection	27	19_technology_treatment_methods_medical
20	novel - translation - henry - read - learn	23	20_novel_translation_henry_read
21	abridged - brief - synopsis - short - citations	22	21_abridged_brief_synopsis_short
22	lecture - pathology - medical - computational - patients	16	22_lecture_pathology_medical_computational

Training hyperparameters

calculate_probabilities: True
language: None
low_memory: False
min_topic_size: 10
n_gram_range: (1, 1)
nr_topics: None
seed_topic_list: None
top_n_words: 10
verbose: True

Framework versions

Numpy: 1.22.4
HDBSCAN: 0.8.29
UMAP: 0.5.3
Pandas: 1.5.3
Scikit-Learn: 1.2.2
Sentence-transformers: 2.2.2
Transformers: 4.29.2
Numba: 0.56.4
Plotly: 5.13.1
Python: 3.10.11

pszemraj
/

BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary

BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary

Usage

Topic overview

Training hyperparameters

Framework versions

Dataset used to train pszemraj/BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary