INDEX
Explanations
words indicating continuance or persistence
New Auto-Interp
Negative Logits
Already
-0.17
already
-0.16
oland
-0.16
already
-0.16
portun
-0.15
Already
-0.15
már
-0.15
alat
-0.15
ãĥĨãĥ«
-0.15
гал
-0.15
POSITIVE LOGITS
ders
0.31
constant
0.24
true
0.23
intact
0.23
unchanged
0.22
(ed
0.20
faithful
0.20
committed
0.19
steadfast
0.18
alive
0.18
Activations Density 0.033%