INDEX
Explanations
occurrences of the word "the" across various contexts
New Auto-Interp
Negative Logits
erdale
-0.17
atak
-0.16
<translation
-0.16
etros
-0.16
æĺĮ
-0.15
najle
-0.15
.sz
-0.15
iverse
-0.15
ernet
-0.14
еÑĢÑĤа
-0.14
POSITIVE LOGITS
equivalent
0.25
tail
0.21
span
0.20
start
0.18
end
0.17
same
0.17
height
0.16
ele
0.16
ait
0.16
middle
0.16
Activations Density 0.187%