INDEX
Explanations
names of individuals and their roles or titles
New Auto-Interp
Negative Logits
himself
-0.16
mell
-0.14
outer
-0.14
_outer
-0.14
outer
-0.14
exact
-0.14
bett
-0.14
icap
-0.13
loc
-0.13
ļ
-0.13
POSITIVE LOGITS
ture
0.18
ovna
0.17
herself
0.17
lify
0.17
stery
0.16
ÑģÑĤала
0.15
Pregn
0.15
.Encode
0.15
oreach
0.14
spar
0.14
Activations Density 0.222%