INDEX
Explanations
details related to political figures and events
New Auto-Interp
Negative Logits
Novel
-0.60
Pony
-0.60
Cipher
-0.60
Drift
-0.59
Aer
-0.58
Dance
-0.58
ETS
-0.56
puter
-0.56
Lys
-0.56
juggling
-0.55
POSITIVE LOGITS
chard
1.29
nam
1.21
Else
1.14
ifice
1.08
acles
1.06
acle
1.04
nery
1.02
acular
1.00
chid
1.00
ific
0.99
Activations Density 3.665%