INDEX
Explanations
information related to endings or conclusions
New Auto-Interp
Negative Logits
shire
-0.78
ingen
-0.76
velt
-0.75
cott
-0.72
afort
-0.72
selves
-0.71
relative
-0.70
md
-0.69
friends
-0.69
kun
-0.68
POSITIVE LOGITS
straw
1.10
installment
1.01
hurdle
0.93
nail
0.93
showdown
0.92
blow
0.89
curtain
0.88
resting
0.87
stages
0.86
piece
0.85
Activations Density 0.024%