INDEX
Explanations
activations related to political terms
occurrences of the letter "P"
New Auto-Interp
Negative Logits
adm
-0.76
susp
-0.69
ģĸ
-0.69
diplom
-0.68
Aval
-0.67
Pry
-0.67
phyl
-0.66
wip
-0.65
differe
-0.65
transports
-0.64
POSITIVE LOGITS
redict
1.39
ossible
1.35
ossession
1.28
aired
1.25
ractical
1.21
ossibly
1.20
odcast
1.20
overty
1.19
ierce
1.18
ardon
1.18
Activations Density 0.056%