INDEX
Explanations
the title of movies within a sentence
quotation marks in the text
New Auto-Interp
Negative Logits
matter
-0.83
grasp
-0.75
batter
-0.73
pity
-0.73
peg
-0.72
pillar
-0.72
barr
-0.72
prag
-0.72
affiliate
-0.70
prey
-0.69
POSITIVE LOGITS
Big
1.19
Morning
1.19
their
1.15
Friends
1.14
Operation
1.14
Untitled
1.13
Bad
1.13
classic
1.12
Golden
1.12
Ultimate
1.11
Activations Density 0.089%