INDEX
Explanations
references to progress and progressive ideologies
New Auto-Interp
Negative Logits
upo
-0.17
indow
-0.17
pais
-0.15
ÑĨе
-0.15
icina
-0.15
strap
-0.15
ijing
-0.15
unami
-0.14
entai
-0.14
aces
-0.14
POSITIVE LOGITS
ions
0.38
ional
0.36
ion
0.36
ively
0.35
ivism
0.35
ive
0.30
iveness
0.30
sing
0.28
ives
0.27
ivity
0.27
Activations Density 0.019%