INDEX
Explanations
names of characters or individuals
mentions of specific character names from popular media
New Auto-Interp
Negative Logits
ocrats
-0.78
utory
-0.77
-+-+
-0.74
anooga
-0.73
ENTION
-0.70
ocracy
-0.69
é¾įå¥ij士
-0.69
urated
-0.68
Dominion
-0.68
otaur
-0.66
POSITIVE LOGITS
kj
1.20
Myster
1.01
ldom
0.85
Rey
0.84
ulic
0.84
senal
0.79
zin
0.74
uling
0.72
rio
0.70
issance
0.69
Activations Density 0.009%