INDEX
Explanations
specific numerical values or quantities in the text
New Auto-Interp
Negative Logits
laus
-0.16
apt
-0.16
Harden
-0.15
iffe
-0.15
otope
-0.15
Levine
-0.14
emm
-0.14
ibility
-0.14
icy
-0.14
ible
-0.14
POSITIVE LOGITS
felt
0.17
lessly
0.16
/th
0.16
islav
0.16
748
0.15
amel
0.15
0
0.15
919
0.15
924
0.14
ÅĤe
0.14
Activations Density 0.101%