INDEX
Explanations
pronouns and references to the speaker and their group
New Auto-Interp
Negative Logits
Äįka
-0.17
IVA
-0.16
iaux
-0.15
WithValue
-0.15
ossa
-0.14
istrovstvÃŃ
-0.14
regor
-0.14
rena
-0.14
mlink
-0.14
trap
-0.14
POSITIVE LOGITS
DT
0.16
ONEY
0.16
-fw
0.15
pin
0.15
CW
0.15
Gel
0.14
DT
0.14
Hubb
0.14
zi
0.14
-utils
0.14
Activations Density 0.001%