C2-Topic-Model-100 / README.md
AlexanderHolmes0's picture
Update README.md
d59dcde verified
|
raw
history blame
10.1 kB
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

C2-Topic-Model-100

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("AlexanderHolmes0/C2-Topic-Model-100")

topic_model.get_topic_info()

An example of the Chat GPT - 3.5 Turbo representations: "multiaspect.png"

Topic overview

  • Number of topics: 100
  • Number of training documents: 828299
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 you - to - your - and - that 546 -1_you_to_your_and
0 price - of - in - value - index 34030 0_price_of_in_value
1 you - card - credit - your - to 102457 1_you_card_credit_your
2 rating - trust - shares - quaer - stock 171217 2_rating_trust_shares_quaer
3 game - washington - wizards - the - season 30538 3_game_washington_wizards_the
4 app - easy - love - great - my 57205 4_app_easy_love_great
5 and - to - the - in - about 27829 5_and_to_the_in
6 deals - save - com - black - deal 67826 6_deals_save_com_black
7 app - my - it - me - account 23384 7_app_my_it_me
8 easy - use - very - convenient - navigate 25012 8_easy_use_very_convenient
9 tickets - ticketmaster - swift - presale - taylor 20637 9_tickets_ticketmaster_swift_presale
10 you - work - to - that - it 14825 10_you_work_to_that
11 rating - stock - shares - 00 - quaer 16734 11_rating_stock_shares_00
12 patent - virginia - alexandria - inventors - assigned 9483 12_patent_virginia_alexandria_inventors
13 de - la - en - el - que 5562 13_de_la_en_el
14 it - you - wiki - that - he 7589 14_it_you_wiki_that
15 biden - capitol - that - trump - the 29399 15_biden_capitol_that_trump
16 great - good - excellent - awesome - very 15650 16_great_good_excellent_awesome
17 hi - hey - me - hello - hu 9391 17_hi_hey_me_hello
18 cou - county - case - to - the 3496 18_cou_county_case_to
19 arena - center - tour - music - tickets 9937 19_arena_center_tour_music
20 ihearadio - jingle - ball - photo - her 16963 20_ihearadio_jingle_ball_photo
21 and - to - the - you - for 7139 21_and_to_the_you
22 garcia - davis - fight - gervonta - he 13708 22_garcia_davis_fight_gervonta
23 ihearadio - photo - festival - music - getty 3245 23_ihearadio_photo_festival_music
24 farm - farmers - loans - loan - mogage 4962 24_farm_farmers_loans_loan
25 banks - bank - zelle - overdraft - fees 2692 25_banks_bank_zelle_overdraft
26 covid - nyc - bar - search - map 8970 26_covid_nyc_bar_search
27 helix - solutions - aris - energy - water 1398 27_helix_solutions_aris_energy
28 bs - blt - bue - crap - bib 1635 28_bs_blt_bue_crap
29 matador - ironwood - resources - mtdr - company 1286 29_matador_ironwood_resources_mtdr
30 doubleverify - dv - shopify - rating - stock 1336 30_doubleverify_dv_shopify_rating
31 __ - http - add - me - please 1184 31____http_add_me
32 amphastar - crowdstrike - pharmaceuticals - rating - stock 1583 32_amphastar_crowdstrike_pharmaceuticals_rating
33 boy - wentz - band - fob - ego 1087 33_boy_wentz_band_fob
34 nabors - industries - arvinas - rating - drilling 1233 34_nabors_industries_arvinas_rating
35 arena - blink - center - may - 182 1215 35_arena_blink_center_may
36 biosciences - akoya - biopharmaceuticals - stock - of 1573 36_biosciences_akoya_biopharmaceuticals_stock
37 therapeutics - rapt - chimerix - stock - rating 1795 37_therapeutics_rapt_chimerix_stock
38 written - news - episode - video - the 1170 38_written_news_episode_video
39 one - capital - you - it - that 5740 39_one_capital_you_it
40 wizards - homebody - predictions - odds - picks 13044 40_wizards_homebody_predictions_odds
41 sweepstakes - sponsor - any - prize - or 2093 41_sweepstakes_sponsor_any_prize
42 cameron - her - getty - images - photo 852 42_cameron_her_getty_images
43 arena - center - aug - drake - tour 785 43_arena_center_aug_drake
44 eur1 - eur - of - price - in 1120 44_eur1_eur_of_price
45 puth - like - love - he - song 2216 45_puth_like_love_he
46 easy - very - peasy - fast - quick 1798 46_easy_very_peasy_fast
47 immersive - vr - tech - forward - looking 2191 47_immersive_vr_tech_forward
48 chargepoint - chpt - rating - stock - shares 1611 48_chargepoint_chpt_rating_stock
49 chvrches - hott - stranding - her - mayberry 592 49_chvrches_hott_stranding_her
50 tkk - tk - tzsarbreaux - 20se - ttds 563 50_tkk_tk_tzsarbreaux_20se
51 morello - white - belasco - untamable - unpredictable 630 51_morello_white_belasco_untamable
52 our - peapack - gladstone - we - and 561 52_our_peapack_gladstone_we
53 white - footage - ihearadio - his - beck 677 53_white_footage_ihearadio_his
54 libey - energy - lb - oilfield - stock 668 54_libey_energy_lb_oilfield
55 trump - investigation - james - attorney - his 853 55_trump_investigation_james_attorney
56 adr - adrs - mellon - of - in 1024 56_adr_adrs_mellon_of
57 energy - rating - chesapeake - oil - research 3156 57_energy_rating_chesapeake_oil
58 center - arena - 12 - tso - ghosts 2762 58_center_arena_12_tso
59 socure - identity - verification - fraud - ventures 484 59_socure_identity_verification_fraud
60 dua - her - pop - lipa - ihearadio 567 60_dua_her_pop_lipa
61 laroi - kid - his - song - unreleased 747 61_laroi_kid_his_song
62 fennec - pharmaceuticals - fenc - rating - stock 413 62_fennec_pharmaceuticals_fenc_rating
63 lizzo - dance - kardashian - her - noh 466 63_lizzo_dance_kardashian_her
64 saul - centers - bfs - rating - shares 454 64_saul_centers_bfs_rating
65 ryan - brothers - jingle - ihearadio - ajr 316 65_ryan_brothers_jingle_ihearadio
66 food - wine - festival - chef - beach 536 66_food_wine_festival_chef
67 arena - spos - pumpkins - center - smashing 1107 67_arena_spos_pumpkins_center
68 her - max - ava - jingle - ihearadio 636 68_her_max_ava_jingle
69 montrose - environmental - meg - group - rating 420 69_montrose_environmental_meg_group
70 lauv - song - his - photo - jingle 347 70_lauv_song_his_photo
71 niall - harry - album - circus - direction 344 71_niall_harry_album_circus
72 flute - lizzo - library - crystal - congress 396 72_flute_lizzo_library_crystal
73 alamos - agi - gold - newswire - globe 1246 73_alamos_agi_gold_newswire
74 capitals - sharks - nhl - expe - odds 516 74_capitals_sharks_nhl_expe
75 07 - 06 - center - arena - paramore 571 75_07_06_center_arena
76 dcfc - tritium - chargers - rating - limited 698 76_dcfc_tritium_chargers_rating
77 groove - engagement - sales - platform - salesforce 288 77_groove_engagement_sales_platform
78 album - billboard - songs - cha - swift 354 78_album_billboard_songs_cha
79 google - viual - card - you - your 962 79_google_viual_card_you
80 train - album - am - gold - ihearadio 1194 80_train_album_am_gold
81 ihearadio - chili - peppers - hair - he 540 81_ihearadio_chili_peppers_hair
82 dj - baseball - ice - hockey - football 381 82_dj_baseball_ice_hockey
83 dhabi - abu - adcb - aed1 - al 749 83_dhabi_abu_adcb_aed1
84 salle - la - saint - joseph - explorers 360 84_salle_la_saint_joseph
85 holiday - day - open - closed - holidays 254 85_holiday_day_open_closed
86 iheacountry - festival - country - ihearadio - austin 976 86_iheacountry_festival_country_ihearadio
87 foods - simply - smpl - good - kilts 1490 87_foods_simply_smpl_good
88 parking - http - howard - city - divisons 387 88_parking_http_howard_city
89 bellamy - delonge - extraterrestrial - ego - alter 373 89_bellamy_delonge_extraterrestrial_ego
90 rhett - his - country - thomas - album 289 90_rhett_his_country_thomas
91 lgbt - prnewswire - national - nglcc - wbenc 4136 91_lgbt_prnewswire_national_nglcc
92 tires - tire - goodyear - save - walma 912 92_tires_tire_goodyear_save
93 customer - marketing - business - customers - your 453 93_customer_marketing_business_customers
94 twitter - musk - that - to - elon 844 94_twitter_musk_that_to
95 cincinnati - fhlb - results - unaudited - prnewswire 1539 95_cincinnati_fhlb_results_unaudited
96 accurate - complete - securities - reader - necessarily 581 96_accurate_complete_securities_reader
97 22 - arena - center - vengeance - viva 383 97_22_arena_center_vengeance
98 button - mobile - marketers - commerce - platform 733 98_button_mobile_marketers_commerce

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 500
  • n_gram_range: (1, 1)
  • nr_topics: 100
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.26.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.5
  • Pandas: 2.0.3
  • Scikit-Learn: 1.4.1.post1
  • Sentence-transformers: 2.5.1
  • Transformers: 4.39.1
  • Numba: 0.59.1
  • Plotly: 5.20.0
  • Python: 3.11.8