INDEX
Explanations
names of political figures or terms related to political events
New Auto-Interp
Negative Logits
glers
-0.94
ahime
-0.75
istically
-0.66
[|
-0.66
ERY
-0.65
VILLE
-0.65
phal
-0.62
beit
-0.62
Cage
-0.61
Leap
-0.61
POSITIVE LOGITS
orters
1.50
rint
1.36
orter
1.30
rieve
1.28
utations
1.25
ublic
1.22
utation
1.21
ository
1.20
orted
1.16
atri
1.16
Activations Density 0.510%