INDEX
Explanations
contractions and possessive forms in the text
New Auto-Interp
Negative Logits
dana
-0.17
åľ°æĸ¹
-0.16
å£°éŁ³
-0.16
æīĭ
-0.16
’n
-0.16
æĥħåĨµ
-0.15
shaw
-0.15
auss
-0.14
acons
-0.14
ãĥ¬ãĤ¤
-0.14
POSITIVE LOGITS
ezier
0.18
richt
0.17
ullivan
0.17
outh
0.15
εÏĦ
0.14
ed
0.14
baum
0.14
ãĥ«ãĥī
0.14
evin
0.14
phalt
0.13
Activations Density 0.028%