INDEX
Explanations
proper nouns
words and terms related to power, governance, and authority
New Auto-Interp
Negative Logits
roman
-0.68
erb
-0.67
CP
-0.67
COM
-0.66
MP
-0.64
rices
-0.64
AW
-0.63
eb
-0.63
Mob
-0.63
CSS
-0.62
POSITIVE LOGITS
theless
0.78
abwe
0.69
Ó
0.68
udence
0.64
henko
0.64
ruction
0.63
ensional
0.62
ãĤ£
0.62
zyk
0.62
sson
0.61
Activations Density 0.159%