INDEX
Explanations
references to the publication "National Review."
references to specific publications or reviews
New Auto-Interp
Negative Logits
llular
-0.76
tyr
-0.67
inav
-0.66
stem
-0.66
iris
-0.64
othing
-0.63
mers
-0.63
JPM
-0.62
ignty
-0.62
challeng
-0.62
POSITIVE LOGITS
Review
0.88
er
0.86
Journal
0.82
aire
0.80
ers
0.80
Transcript
0.76
Reviews
0.76
ered
0.75
eman
0.75
Papers
0.73
Activations Density 0.025%