INDEX
Explanations
phrases related to feelings of discomfort or surprise
expressions of surprise or disbelief
New Auto-Interp
Negative Logits
Financial
-0.68
Applications
-0.68
automobile
-0.67
à¨
-0.65
verning
-0.64
ourses
-0.61
Therefore
-0.60
guidance
-0.60
Economic
-0.59
policymakers
-0.59
POSITIVE LOGITS
eeee
1.03
gotta
0.96
oooooooo
0.91
oooooooooooooooo
0.87
fucking
0.87
kinda
0.86
kidding
0.85
fuckin
0.83
oooo
0.83
?!
0.83
Activations Density 1.010%