INDEX
Explanations
instances of the word "it"
New Auto-Interp
Negative Logits
hill
-0.20
hana
-0.19
hip
-0.18
اÙĨÙĩ
-0.18
hood
-0.18
(es
-0.18
hift
-0.17
s
-0.17
house
-0.16
er
-0.16
POSITIVE LOGITS
iner
0.52
unes
0.43
chy
0.43
/th
0.40
inerary
0.31
ches
0.29
ty
0.29
self
0.29
aly
0.28
SELF
0.28
Activations Density 0.581%