INDEX
Explanations
repeated occurrences of the word "the."
New Auto-Interp
Negative Logits
edList
-0.15
Reach
-0.15
ocale
-0.15
ussen
-0.15
atak
-0.14
DAMAGES
-0.14
нила
-0.14
'''č↵
-0.14
CONSEQUENTIAL
-0.14
vÃŃ
-0.14
POSITIVE LOGITS
same
0.24
equivalent
0.22
same
0.20
beginnings
0.19
opportunity
0.18
ability
0.18
following
0.17
عÛĮ
0.16
option
0.16
même
0.16
Activations Density 0.613%