INDEX
Explanations
instances of the word "it" in various contexts
New Auto-Interp
Negative Logits
hill
-0.17
hand
-0.16
h
-0.16
ishly
-0.15
hana
-0.14
erif
-0.14
wan
-0.14
line
-0.14
e
-0.14
er
-0.13
POSITIVE LOGITS
iner
0.32
zelf
0.25
/th
0.23
chy
0.23
inerary
0.22
ches
0.22
/her
0.21
же
0.21
unes
0.20
self
0.18
Activations Density 0.189%