INDEX
Explanations
references to failure or lack of success
New Auto-Interp
Negative Logits
onen
-0.78
dar
-0.77
utra
-0.75
enfranch
-0.75
iser
-0.75
ript
-0.74
inda
-0.73
arya
-0.72
onto
-0.72
atu
-0.72
POSITIVE LOGITS
miser
1.68
lect
1.03
horribly
1.03
dism
1.02
catast
1.00
ingly
0.93
fully
0.89
spectacular
0.85
DEV
0.85
nces
0.84
Activations Density 7.681%