INDEX
Explanations
words related to scientific topics and controversial issues
phrases asserting the presence or importance of various issues, often contrasting them
New Auto-Interp
Negative Logits
Works
-0.67
Ends
-0.67
Believe
-0.64
ools
-0.64
Moines
-0.63
Janeiro
-0.62
sburg
-0.62
Supports
-0.61
ieves
-0.61
ACTIONS
-0.60
POSITIVE LOGITS
poised
1.05
Reviewer
1.02
indispensable
1.00
arguably
1.00
prominently
0.98
intrinsically
0.96
insepar
0.95
synonymous
0.95
regarded
0.93
overshadowed
0.93
Activations Density 0.485%