INDEX
Explanations
abbreviated names or initials of individuals, potentially related to interviews or news articles
New Auto-Interp
Negative Logits
holders
-0.96
holder
-0.81
Illum
-0.68
houses
-0.65
doors
-0.62
cluding
-0.61
constit
-0.58
rooting
-0.57
rights
-0.56
dictators
-0.56
POSITIVE LOGITS
ournals
1.30
ealous
1.29
upiter
1.23
igsaw
1.21
unction
1.19
umbo
1.10
okers
1.06
itsu
1.04
utsu
1.04
ansson
1.03
Activations Density 1.663%