File size: 40,211 Bytes
b4e2770
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:183
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: '14 William O. Douglas, quoted in Charles Hurd, Film Booking Issue
    Ordered Reopened,” New York Times, May 4, 1948, 1. 15 Movie Crisis Laid to Video
    Inroads And Dwindling of Foreign Market, New York Times, February 27, 1949, F1.
    For details on the lawsuit and its effects, see Arthur De Vany and Henry McMillan,
    Was the Antitrust Action that Broke Up the Movie Studios Good for the Movies?
    Evidence from the Stock Market. American Law and Economics Review 6, no. 1 (2004):
    135-53; and J.C. Strick, The Economics of the Motion Picture Industry: A Survey,
    Philosophy of the Social Sciences 8, no. 4 (December 1978): 406-17. 16 The Hollywood
    feature films for which Eisler provided music are Hangmen Also Die (1942), None
    But the Lonely Heart (1944), Jealousy (1945), The Spanish Main (1945); A Scandal
    in Paris (1946), Deadline at Dawn (1946), Woman on the Beach (1947), and So Well
    Remembered (1947). Most of these are middle-of-the-road genre pieces, but the
    first NOTES 267'
  sentences:
  - What is the opinion of Ernest Irving, a pioneer of British film music, on the
    overall quality of American film music?
  - What is the title of the 2007 film directed by David Fincher, produced by Michael
    Medavoy, and featuring a storyline based on a real-life serial killer, as mentioned
    in the provided context information?
  - What was the primary reason behind the lawsuit that led to the breakup of the
    movie studios, as suggested by the article in the New York Times on February 27,
    1949?
- source_sentence: 'But Gorbman (who like Flinn and Kalinak approached film music
    from a formal background not in musicology but in literary criticism) was certainly
    not the first scholar engaged in so-called film studies44 to address the role
    that extra-diegetic music played in classical-style films. Two years before Gorbman''s
    book was published, the trio of Bordwell, Staiger, and Thompson brought out their
    monumental The Classical Hollywood Cinema: Film Style and Production to 1960.
    As noted above, and apropos of its title, the book focuses on filmic narrative
    style and the technical devices that made this style possible. In its early pages,
    however, it also contains insightful comments on classical cinema''s use of music.
    The book''s first music-related passage lays a foundation for Gorbman''s point
    about how a score might lend unity to a film by recycling distinctive themes that
    within the THE GOLDEN AGE OF FILM MUSIC, 1933-49 143'
  sentences:
  - What is the possible reason, as suggested by David Thomson, for why David Lean's
    filmmaking style may have declined after the movie "Summer Madness" (US, 1955)?
  - What shift in the portrayal of hard body male characters in film, as exemplified
    by the actors who played these roles in the 1980s and 1990s, suggests that societal
    expectations and norms may be changing?
  - What is the significance of the authors' formal background in literary criticism
    rather than musicology, as mentioned in the context of Gorbman's approach to film
    music?
- source_sentence: (1931); Georg Wilhelm Pabst's Kameradschaft (1931); Fritz Lang's
    M (1931) and Das Testament der Dr. Mabuse (1932); and Carl Theodor Dreyer's Vampyr
    (1932). These films’ subtle mix of actual silence with accompanying music and
    more or less realistic sound effects has drawn and doubtless will continue to
    draw serious analytical attention from film scholars.45 And even in their own
    time they drew due attention aplenty from critics of avant-garde persuasion.46
    The mere fact that these films differed from the sonic norm attracted the notice,
    if not always the praise, of movie reviewers for the popular press. Writing from
    London, a special correspondent for the New York Times observed that Hitchcock's
    Blackmail goes some way to showing how the cinematograph and the microphone can
    be mated without their union being forced upon the attention of a punctilious
    world as VITAPHONE AND MOVIETONE, 1926-8 101
  sentences:
  - What was the primary limitation that led to the failure of Edison's first Kinetophone,
    which was an early attempt at sound film featuring musical accompaniment?
  - What was the specific sonic approach employed by the mentioned films of Georg
    Wilhelm Pabst, Fritz Lang, and Carl Theodor Dreyer that drew serious analytical
    attention from film scholars?
  - What limitation in Martin Scorsese's background, as mentioned in the text, restricted
    his choice of subjects at this stage in his career?
- source_sentence: "39\tdivided into small, three-dimensional cubes known as volumetric\
    \ pixels, or voxels. When viewers are watching certain images, the voxel demonstrates\
    \ how these images in the movie are mapped into brain activity. Clips of the movie\
    \ are reconstructed through brain imaging and computer stimulation by associating\
    \ visual patterns in the movie with the corresponding brain activity. However,\
    \ these reconstructions are blurry and are hard to make because researchers say,\
    \ blood flow signals measured using fMRI change much more slowly than the neural\
    \ signals that encode dynamic information in movies. Psychology and neuroscience\
    \ professor, Jack Gallant explains in an interview that primary visual cortex\
    \ responds to the local features of the movie such as edges, colors, motion, and\
    \ texture but this part of the brain cannot understand the objects in the movie.\
    \ In addition, movies that show people are reconstructed with better accuracy\
    \ than abstract images. Using Neuroimaging For Entertainment Success Can brain\
    \ scans predict movie success in the box office? Two marketing researchers from\
    \ the Rotterdam School of Management devised an experiment by using EEG on participants.\
    \ EEG demonstrated that individual choice and box office success correlate with\
    \ different types of brain activity. From article, How Neuroimaging Can Save The\
    \ Entertainment Industry Millions of Dollars, it states, individual choice is\
    \ predicted best by high frontocentral beta activity, the choice of the general\
    \ population is predicted by frontal gamma activity. Perhaps, with quickly advanced\
    \ technology, predicting movie genre and plots that can hit the box office could\
    \ be successful. Neurocinema in Hollywood One strategy that helps filmmakers,\
    \ producers, and distributors to achieve global market success is by using fMRI\
    \ and EEG to make a better storyline, characters, sound effects, and other"
  sentences:
  - What significant change in the portrayal of Rocky's character is evident in the
    2015 movie Creed, as compared to the original 1976 film Rocky?
  - What factors led to the selection of the films "Spider-man" (2002), "Cars" (2006),
    and "Avatar" (2009) for the research project examining the relationship between
    film and society in the early 2000s?
  - What is the main reason why researchers find it challenging to reconstruct abstract
    images from movie clips using brain imaging and computer stimulation?
- source_sentence: "11\tdocumentary film so unpleasant when most had sat through horror\
    \ pictures that were appreciably more violent and bloody.  The answer that McCauley\
    \ came up with was that the fictional nature of horror films affords viewers a\
    \ sense of control by placing psychological distance between them and the violent\
    \ acts they have witnessed. Most people who view horror movies understand that\
    \ the filmed events are unreal, which furnishes them with psychological distance\
    \ from the horror portrayed in the film. In fact, there is evidence that young\
    \ viewers who perceive greater realism in horror films are more negatively affected\
    \ by their exposure to horror films than viewers who perceive the film as unreal\
    \ (Hoekstra, Harris, & Helmick, 1999). Four Viewing Motivations for Graphic Horror\
    \   According to Dr. Deirdre Johnston (1995) study Adolescents’ Motivations for\
    \ Viewing Graphic Horror of Human Communication Research there are four different\
    \ main reasons for viewing graphic horror. From the study of a small sample of\
    \ 220 American adolescents who like watching horror movies, Dr. Johnston reported\
    \ that: The four viewing motivations are found to be related to viewers’ cognitive\
    \ and affective responses to horror films, as well as viewers’ tendency to identify\
    \ with either the killers or victims in these films.\" Dr. Johnson notes that:\
    \  1) gore watchers typically had low empathy, high sensation seeking, and (among\
    \  males only) a strong identification with the killer, 2) thrill watchers typically\
    \ had  both high empathy and sensation seeking, identified themselves more with\
    \ the  victims, and liked the suspense of the film, 3) independent watchers typically\
    \ had  a  high empathy for the victim along with a high positive effect for overcoming\
    \  fear, and 4) problem watchers typically had high empathy for the victim but\
    \ were"
  sentences:
  - What was the name of the series published by Oliver Ditson from 1918-25 that contained
    ensemble music for motion picture plays?
  - What shift in the cultural, political, and social contexts of the 1980s and 1990s
    may have led to the deconstruction of the hard body characters portrayed by actors
    such as Stallone and Schwarzenegger in more recent movies?
  - What is the primary reason why viewers who perceive greater realism in horror
    films are more negatively affected by their exposure to horror films than viewers
    who perceive the film as unreal?
datasets:
- YxBxRyXJx/QAsimple_for_BGE_241019
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: BGE base Movie Matryoshka
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 768
      type: dim_768
    metrics:
    - type: cosine_accuracy@1
      value: 0.8205128205128205
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.9743589743589743
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 1.0
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 1.0
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.8205128205128205
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.32478632478632485
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.20000000000000004
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.10000000000000002
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.8205128205128205
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.9743589743589743
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 1.0
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 1.0
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.9207838928594967
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.8940170940170941
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.8940170940170938
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 512
      type: dim_512
    metrics:
    - type: cosine_accuracy@1
      value: 0.8461538461538461
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.9230769230769231
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 1.0
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 1.0
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.8461538461538461
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.30769230769230776
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.20000000000000004
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.10000000000000002
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.8461538461538461
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.9230769230769231
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 1.0
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 1.0
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.9233350110390831
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.8982905982905982
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.8982905982905982
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 256
      type: dim_256
    metrics:
    - type: cosine_accuracy@1
      value: 0.8461538461538461
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.9230769230769231
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.9487179487179487
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 1.0
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.8461538461538461
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.30769230769230776
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.18974358974358976
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.10000000000000002
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.8461538461538461
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.9230769230769231
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.9487179487179487
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 1.0
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.9234104189545929
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.898962148962149
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.898962148962149
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 128
      type: dim_128
    metrics:
    - type: cosine_accuracy@1
      value: 0.7692307692307693
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.8974358974358975
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.9487179487179487
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.9487179487179487
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.7692307692307693
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.29914529914529925
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.18974358974358976
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.09487179487179488
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.7692307692307693
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.8974358974358975
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.9487179487179487
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.9487179487179487
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.8688480033444261
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.8418803418803418
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.8443986568986569
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 64
      type: dim_64
    metrics:
    - type: cosine_accuracy@1
      value: 0.5641025641025641
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.8717948717948718
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.9230769230769231
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.9487179487179487
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.5641025641025641
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.2905982905982907
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.18461538461538465
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.09487179487179488
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.5641025641025641
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.8717948717948718
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.9230769230769231
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.9487179487179487
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.768187565996018
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.708119658119658
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.7088711597999523
      name: Cosine Map@100
---

# BGE base Movie Matryoshka

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the [q_asimple_for_bge_241019](https://huggingface.co/datasets/YxBxRyXJx/QAsimple_for_BGE_241019) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
    - [q_asimple_for_bge_241019](https://huggingface.co/datasets/YxBxRyXJx/QAsimple_for_BGE_241019)
- **Language:** en
- **License:** apache-2.0

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("YxBxRyXJx/bge-base-movie-matryoshka")
# Run inference
sentences = [
    '11\tdocumentary film so unpleasant when most had sat through horror pictures that were appreciably more violent and bloody.  The answer that McCauley came up with was that the fictional nature of horror films affords viewers a sense of control by placing psychological distance between them and the violent acts they have witnessed. Most people who view horror movies understand that the filmed events are unreal, which furnishes them with psychological distance from the horror portrayed in the film. In fact, there is evidence that young viewers who perceive greater realism in horror films are more negatively affected by their exposure to horror films than viewers who perceive the film as unreal (Hoekstra, Harris, & Helmick, 1999). Four Viewing Motivations for Graphic Horror   According to Dr. Deirdre Johnston (1995) study Adolescents’ Motivations for Viewing Graphic Horror of Human Communication Research there are four different main reasons for viewing graphic horror. From the study of a small sample of 220 American adolescents who like watching horror movies, Dr. Johnston reported that: The four viewing motivations are found to be related to viewers’ cognitive and affective responses to horror films, as well as viewers’ tendency to identify with either the killers or victims in these films." Dr. Johnson notes that:  1) gore watchers typically had low empathy, high sensation seeking, and (among  males only) a strong identification with the killer, 2) thrill watchers typically had  both high empathy and sensation seeking, identified themselves more with the  victims, and liked the suspense of the film, 3) independent watchers typically had  a  high empathy for the victim along with a high positive effect for overcoming  fear, and 4) problem watchers typically had high empathy for the victim but were',
    'What is the primary reason why viewers who perceive greater realism in horror films are more negatively affected by their exposure to horror films than viewers who perceive the film as unreal?',
    'What shift in the cultural, political, and social contexts of the 1980s and 1990s may have led to the deconstruction of the hard body characters portrayed by actors such as Stallone and Schwarzenegger in more recent movies?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval

* Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | dim_768    | dim_512    | dim_256    | dim_128    | dim_64     |
|:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
| cosine_accuracy@1   | 0.8205     | 0.8462     | 0.8462     | 0.7692     | 0.5641     |
| cosine_accuracy@3   | 0.9744     | 0.9231     | 0.9231     | 0.8974     | 0.8718     |
| cosine_accuracy@5   | 1.0        | 1.0        | 0.9487     | 0.9487     | 0.9231     |
| cosine_accuracy@10  | 1.0        | 1.0        | 1.0        | 0.9487     | 0.9487     |
| cosine_precision@1  | 0.8205     | 0.8462     | 0.8462     | 0.7692     | 0.5641     |
| cosine_precision@3  | 0.3248     | 0.3077     | 0.3077     | 0.2991     | 0.2906     |
| cosine_precision@5  | 0.2        | 0.2        | 0.1897     | 0.1897     | 0.1846     |
| cosine_precision@10 | 0.1        | 0.1        | 0.1        | 0.0949     | 0.0949     |
| cosine_recall@1     | 0.8205     | 0.8462     | 0.8462     | 0.7692     | 0.5641     |
| cosine_recall@3     | 0.9744     | 0.9231     | 0.9231     | 0.8974     | 0.8718     |
| cosine_recall@5     | 1.0        | 1.0        | 0.9487     | 0.9487     | 0.9231     |
| cosine_recall@10    | 1.0        | 1.0        | 1.0        | 0.9487     | 0.9487     |
| **cosine_ndcg@10**  | **0.9208** | **0.9233** | **0.9234** | **0.8688** | **0.7682** |
| cosine_mrr@10       | 0.894      | 0.8983     | 0.899      | 0.8419     | 0.7081     |
| cosine_map@100      | 0.894      | 0.8983     | 0.899      | 0.8444     | 0.7089     |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### q_asimple_for_bge_241019

* Dataset: [q_asimple_for_bge_241019](https://huggingface.co/datasets/YxBxRyXJx/QAsimple_for_BGE_241019) at [66635cd](https://huggingface.co/datasets/YxBxRyXJx/QAsimple_for_BGE_241019/tree/66635cde6ada74a8cf5a84db10518119fc1c221d)
* Size: 183 training samples
* Columns: <code>positive</code> and <code>anchor</code>
* Approximate statistics based on the first 183 samples:
  |         | positive                                                                             | anchor                                                                             |
  |:--------|:-------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
  | type    | string                                                                               | string                                                                             |
  | details | <ul><li>min: 191 tokens</li><li>mean: 356.1 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 36.04 tokens</li><li>max: 66 tokens</li></ul> |
* Samples:
  | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | anchor                                                                                                                                                                                                                 |
  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>1	   Introduction  Why do we watch horror films? What makes horror films so exciting to watch? Why do our bodies sweat and muscles tense when we are scared? How do filmmakers, producers, sound engineers, and cinematographers specifically design a horror film? Can horror movies cause negative, lasting effects on the audience? These are some of the questions that are answered by exploring the aesthetics of horror films and the psychology behind horror movies.  Chapter 1, The Allure of Horror Film, illustrates why we are drawn to scary films by studying different psychological theories and factors. Ideas include: catharsis, subconscious mind, curiosity, thrill, escape from reality, relevance, unrealism, and imagination. Also, this chapter demonstrates why people would rather watch fiction films than documentaries and the motivations for viewing graphic horror.   Chapter 2, Mise-en-scène in Horror Movies, includes purposeful arrangement of scenery and stage properties of horror movie. Also...</code> | <code>What is the name of the emerging field of scientists and filmmakers that uses fMRI and EEG to read people's brain activity while watching movie scenes?</code>                                                   |
  | <code>3	   Chapter 1: The Allure of Horror Film Overview Although watching horror films can make us feel anxious and uneasy, we still continue to watch other horror films one after another. It is ironic how we hate the feeling of being scared, but we still enjoy the thrill. So why do we pay money to watch something to be scared?  Eight Theories on why we watch Horror Films  From research by philosophers, psychoanalysts, and psychologists there are theories that can explain why we are drawn to watching horror films. The first theory, psychoanalyst, Sigmund Freud portrays that horror comes from the “uncanny”  emergence of images and thoughts of the primitive id. The purpose of horror films is to highlight unconscious fears, desire, urges, and primeval archetypes that are buried deep in our collective subconscious  images of mothers and shadows play important roles because they are common to us all. For example, in Alfred Hitchcock's Psycho, a mother plays the role of evil in the main character...</code> | <code>What process, introduced by the Greek Philosopher Aristotle, involves the release of negative emotions through the observation of violent or scary events, resulting in a purging of aggressive emotions?</code> |
  | <code>5	principle unknowable (Jancovich, 2002, p. 35). This meaning, the audience already knows that the plot and the characters are already disgusting, but the surprises in the horror narrative through the discovery of curiosity should give satisfaction. Marvin Zuckerman (1979) proposed that people who scored high in sensation seeking scale often reported a greater interest in exciting things like rollercoasters, bungee jumping and horror films. He argued more individuals who are attracted to horror movies desire the sensation of experience. However, researchers did not find the correlation to thrill-seeking activities and enjoyment of watching horror films always significant.   The Gender Socialization theory (1986) by Zillman, Weaver, Mundorf and Aust exposed 36 male and 36 female undergraduates to a horror movie with the same age, opposite-gender companion of low or high initial appeal who expressed mastery, affective indifference, or distress. They reported that young men enjoyed the fi...</code> | <code>What is the proposed theory by Marvin Zuckerman (1979) regarding the relationship between sensation seeking and interest in exciting activities, including horror films?</code>                                  |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256,
          128,
          64
      ],
      "matryoshka_weights": [
          1,
          1,
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: epoch
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 5
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: True
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 16
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 5
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: True
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch   | Step  | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|:-------:|:-----:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
| 1.0     | 1     | 0.8987                 | 0.8983                 | 0.8835                 | 0.8419                 | 0.7773                |
| 2.0     | 2     | 0.9218                 | 0.9141                 | 0.9075                 | 0.8721                 | 0.8124                |
| 1.0     | 1     | 0.9218                 | 0.9141                 | 0.9075                 | 0.8721                 | 0.8124                |
| 2.0     | 2     | 0.9356                 | 0.9302                 | 0.9118                 | 0.8750                 | 0.8057                |
| **3.0** | **4** | **0.9302**             | **0.9233**             | **0.9234**             | **0.8783**             | **0.7759**            |
| 4.0     | 5     | 0.9208                 | 0.9233                 | 0.9234                 | 0.8688                 | 0.7682                |

* The bold row denotes the saved checkpoint.

### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.3.1
- Transformers: 4.46.3
- PyTorch: 2.5.1+cu121
- Accelerate: 1.1.1
- Datasets: 3.1.0
- Tokenizers: 0.20.3

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->