INDEX
Explanations
phrases indicating subjectivity or personal perception
New Auto-Interp
Negative Logits
him
-0.16
themselves
-0.15
them
-0.15
ÙĴÙĩ
-0.14
icare
-0.14
himself
-0.14
eux
-0.14
ighbors
-0.14
ught
-0.13
agues
-0.13
POSITIVE LOGITS
there
0.29
clear
0.26
apparent
0.24
like
0.24
we
0.23
likely
0.23
they
0.21
evident
0.21
counter
0.20
that
0.19
Activations Density 0.030%