INDEX
Explanations
words related to desperation and urgency
words related to performance metrics or evaluations
New Auto-Interp
Negative Logits
ategory
-0.69
ESCO
-0.62
href
-0.61
ADRA
-0.60
76561
-0.60
krit
-0.58
ourt
-0.58
ellen
-0.58
QR
-0.57
akes
-0.56
POSITIVE LOGITS
manent
1.09
due
1.05
monkey
0.94
nel
0.94
intendent
0.94
cussion
0.91
stein
0.90
ior
0.85
ium
0.85
vasive
0.83
Activations Density 0.026%