INDEX
Explanations
first-person statements and expressions of past experiences or emotions
New Auto-Interp
Negative Logits
not
-0.17
FE
-0.16
nto
-0.15
FE
-0.15
令
-0.14
gle
-0.14
ilo
-0.14
neither
-0.14
airo
-0.14
fe
-0.13
POSITIVE LOGITS
might
0.20
might
0.20
surely
0.18
inv
0.16
somehow
0.16
SURE
0.15
ewis
0.15
.addHandler
0.15
éĵģ
0.14
maybe
0.14
Activations Density 0.090%