INDEX
Explanations
occurrences of the word "the."
New Auto-Interp
Negative Logits
ohan
-0.17
abyrinth
-0.16
enden
-0.16
above
-0.15
unsch
-0.14
rier
-0.14
STITUTE
-0.14
šov
-0.14
otte
-0.14
é£
-0.14
POSITIVE LOGITS
only
0.35
ONLY
0.29
only
0.27
brain
0.26
result
0.26
Only
0.25
oldest
0.25
subject
0.24
sole
0.23
second
0.23
Activations Density 0.260%