INDEX
Explanations
specific issues or topics mentioned in a text
references to pressing societal issues
New Auto-Interp
Negative Logits
urses
-0.82
achev
-0.79
bsite
-0.79
ellow
-0.77
alt
-0.75
bered
-0.74
jin
-0.74
berman
-0.73
glas
-0.73
htaking
-0.72
POSITIVE LOGITS
issue
0.84
Issue
0.83
naires
0.81
DonaldTrump
0.74
plag
0.72
Issues
0.70
Issue
0.70
HRC
0.70
Problem
0.69
tracker
0.68
Activations Density 0.039%