INDEX
Explanations
references to former roles or statuses of individuals
New Auto-Interp
Negative Logits
aware
-0.17
ian
-0.16
opp
-0.14
mey
-0.14
eling
-0.14
_EXTENSIONS
-0.14
aps
-0.14
¸ı
-0.14
ami
-0.13
Aware
-0.13
POSITIVE LOGITS
/current
0.31
/new
0.20
yme
0.18
/original
0.18
mente
0.18
lies
0.16
lad
0.16
theless
0.15
湯
0.15
ly
0.15
Activations Density 0.026%