INDEX
Explanations
instances of personal pronouns and expressions of intention or capability
New Auto-Interp
Negative Logits
ëģĶ
-0.15
swear
-0.14
ocab
-0.14
reib
-0.13
inspace
-0.13
opy
-0.13
ensch
-0.13
ĵĺ
-0.13
éļ
-0.13
Ā
-0.13
POSITIVE LOGITS
already
0.26
Already
0.22
already
0.21
Already
0.20
has
0.19
cannot
0.17
have
0.16
Weg
0.16
certainly
0.15
å·²ç»ı
0.15
Activations Density 0.258%