INDEX
Explanations
references to qualities that may depict criticism or commentary
New Auto-Interp
Negative Logits
rompt
-0.77
heid
-0.70
icer
-0.65
pload
-0.62
alid
-0.59
utive
-0.59
thora
-0.59
Mellon
-0.59
asar
-0.58
moil
-0.58
POSITIVE LOGITS
importantly
1.33
afa
1.06
notably
0.94
likely
0.92
likely
0.90
important
0.83
prominently
0.83
egreg
0.83
notable
0.75
mornings
0.75
Activations Density 0.665%