INDEX
Explanations
phrases indicating inevitability or outcomes based on choices
New Auto-Interp
Negative Logits
itſelf
-0.75
AndEndTag
-0.73
ModelExpression
-0.72
raiſ
-0.72
myſelf
-0.69
Efq
-0.67
ſind
-0.67
@[+][
-0.67
requency
-0.67
protoimpl
-0.64
POSITIVE LOGITS
doomed
0.72
gone
0.71
DONE
0.62
done
0.61
dead
0.61
game
0.60
GONE
0.60
完了
0.60
toast
0.57
Done
0.57
Activations Density 0.228%