INDEX
Explanations
phrases indicating regret or warning signs related to decisions
New Auto-Interp
Negative Logits
zel
-0.16
descon
-0.14
.openg
-0.14
-wow
-0.13
endar
-0.13
.mx
-0.13
OfYear
-0.13
ãģ£
-0.13
ampo
-0.13
òn
-0.13
POSITIVE LOGITS
signs
0.84
Signs
0.71
sign
0.68
signals
0.59
sign
0.59
indicators
0.56
signal
0.55
indication
0.55
indications
0.55
Sign
0.54
Activations Density 0.279%