INDEX
Explanations
repeated and emphasized occurrences of the word "the."
New Auto-Interp
Negative Logits
//{{-0.13
controvers
-0.12
prostituer
-0.12
tá»Ń
-0.11
/respond
-0.11
.ssl
-0.11
gos
-0.11
èĸ
-0.11
isque
-0.11
baugh
-0.11
POSITIVE LOGITS
same
0.22
ses
0.21
exact
0.20
ese
0.18
above
0.18
entire
0.18
amount
0.17
person
0.17
actual
0.17
original
0.16
Activations Density 0.988%