INDEX
Explanations
phrases related to financial situations and consequences
negative or critical statements about various topics
New Auto-Interp
Negative Logits
REDACTED
-0.67
osure
-0.59
nutshell
-0.59
WER
-0.52
atus
-0.51
soDeliveryDate
-0.50
¬¼
-0.50
retrospect
-0.50
ural
-0.50
endum
-0.50
POSITIVE LOGITS
whereas
0.78
often
0.74
Conversely
0.69
selves
0.64
Especially
0.63
regardless
0.63
irrespective
0.63
They
0.63
moreover
0.62
beware
0.62
Activations Density 0.804%