INDEX
Explanations
references to political positions or ideologies
references to political extremes or factions
New Auto-Interp
Negative Logits
Picks
-0.77
Peel
-0.71
urated
-0.61
Bean
-0.60
Rankings
-0.60
ifully
-0.60
Ortiz
-0.59
PROGRAM
-0.58
Mechanics
-0.58
Puzzles
-0.58
POSITIVE LOGITS
thing
1.02
flung
0.98
med
0.97
aday
0.97
fetched
0.96
away
0.92
riers
0.92
rowing
0.90
rier
0.85
seeing
0.78
Activations Density 0.025%