INDEX
Explanations
periods at the end of sentences
New Auto-Interp
Negative Logits
ourselves
-0.15
[this
-0.14
hran
-0.14
----------------------------------------------------------------------------------------------------------------
-0.14
ắt
-0.13
edom
-0.13
à¥ĩय
-0.13
orer
-0.13
steder
-0.13
anmar
-0.13
POSITIVE LOGITS
.That
0.17
last
0.16
That
0.15
earlier
0.15
That
0.14
officials
0.14
éĤ£
0.14
.Restrict
0.14
itu
0.14
ulis
0.14
Activations Density 0.095%