INDEX
Explanations
references to groups of people or identities associated with specific social or professional roles
New Auto-Interp
Negative Logits
argent
-0.17
nnen
-0.15
moid
-0.14
Shaw
-0.14
implify
-0.13
erland
-0.13
MagicMock
-0.13
jis
-0.13
ifes
-0.13
acades
-0.13
POSITIVE LOGITS
myself
0.15
)(_
0.15
ande
0.15
uzu
0.14
Domin
0.14
dio
0.14
oux
0.14
/fa
0.13
Dominion
0.13
ghan
0.13
Activations Density 0.072%