INDEX
Explanations
instances of the words "end" and "ended."
New Auto-Interp
Negative Logits
away
-0.16
331
-0.16
chester
-0.15
laus
-0.15
atak
-0.15
mo
-0.14
ÙĨدÙĩ
-0.14
çi
-0.14
OX
-0.14
forme
-0.14
POSITIVE LOGITS
up
0.36
-up
0.26
.up
0.18
ended
0.18
ow
0.17
ëĵĿ
0.17
up
0.17
elman
0.17
orses
0.17
-Up
0.17
Activations Density 0.014%