INDEX
Explanations
names or terms related to political figures or events
New Auto-Interp
Negative Logits
sburgh
-1.01
ateral
-0.92
ships
-0.85
suit
-0.84
士
-0.82
ipop
-0.79
earable
-0.78
dress
-0.78
itarian
-0.78
irtual
-0.77
POSITIVE LOGITS
rique
1.16
rament
1.06
cest
1.03
rier
1.00
ces
0.96
rous
0.95
ris
0.94
riers
0.92
acle
0.92
rel
0.91
Activations Density 2.334%