INDEX
Explanations
second-person references and personal connections in the text
New Auto-Interp
Negative Logits
ACHI
-0.15
pora
-0.14
redicate
-0.14
lady
-0.14
agna
-0.14
themselves
-0.14
ehr
-0.14
aldo
-0.14
lena
-0.14
à¸Ĺร
-0.13
POSITIVE LOGITS
oneself
0.19
alex
0.15
ixel
0.15
yourself
0.14
.TabStop
0.14
_threads
0.14
surviv
0.13
ledon
0.13
ATEST
0.13
jav
0.13
Activations Density 0.403%