INDEX
Explanations
contractions and possessive forms in the text
New Auto-Interp
Negative Logits
Hers
-0.15
icide
-0.14
unce
-0.14
alte
-0.14
982
-0.13
udget
-0.13
acity
-0.13
985
-0.13
_macros
-0.13
C
-0.13
POSITIVE LOGITS
ãĥ¼ãĥĬ
0.17
yaw
0.17
ênh
0.16
LEM
0.15
Bram
0.15
embr
0.15
auge
0.14
mÄĽ
0.14
oids
0.14
lund
0.13
Activations Density 0.092%