INDEX
Explanations
instances of self-reference and personal opinion expressions
New Auto-Interp
Negative Logits
breadcrumbs
-0.15
Understand
-0.15
ovich
-0.14
šak
-0.14
гл
-0.14
sounds
-0.14
пад
-0.14
hall
-0.14
arat
-0.13
igest
-0.13
POSITIVE LOGITS
comparison
0.18
recalled
0.18
argument
0.18
nhỼ
0.17
remembered
0.16
comparison
0.16
Comparison
0.16
onder
0.16
reminder
0.16
Comparison
0.16
Activations Density 0.029%