INDEX
Explanations
references to political power structures or entities
variations of the word "put."
New Auto-Interp
Negative Logits
#$
-0.72
compr
-0.72
alty
-0.64
issance
-0.63
continents
-0.62
peak
-0.60
purified
-0.60
ende
-0.59
Ħ¢
-0.59
Sins
-0.58
POSITIVE LOGITS
ierrez
1.41
tle
1.19
anamo
1.07
iful
1.04
ifully
0.99
ted
0.95
opian
0.94
opia
0.92
ting
0.90
cheon
0.88
Activations Density 0.011%