INDEX
Explanations
phrases related to various topics such as community issues, health care reform, and economic policies
New Auto-Interp
Negative Logits
Lauder
-0.87
Reviewer
-0.77
spin
-0.70
smear
-0.69
fman
-0.67
resorts
-0.66
Britons
-0.65
shifts
-0.65
Authorities
-0.65
Numbers
-0.65
POSITIVE LOGITS
atisf
1.06
selves
1.02
happening
1.00
uddenly
1.00
ought
0.99
̶
0.99
kaya
0.98
omething
0.97
lightly
0.97
happened
0.94
Activations Density 0.577%