INDEX
Explanations
references to notable figures or entities in specific contexts
references to artistic works or events
New Auto-Interp
Negative Logits
(>
-0.95
(<
-0.86
sbm
-0.86
UNCLASSIFIED
-0.83
nor
-0.78
soever
-0.73
ascript
-0.72
lees
-0.72
(âĪĴ
-0.71
20439
-0.71
POSITIVE LOGITS
downright
0.91
prank
0.90
revenge
0.86
somet
0.84
kicker
0.79
awfully
0.78
trick
0.77
booze
0.77
adorable
0.77
naughty
0.75
Activations Density 0.903%