INDEX
Explanations
phrases indicating actions related to personal experiences or feelings
New Auto-Interp
Negative Logits
ſelf
-0.74
Anſ
-0.68
itſelf
-0.65
iſt
-0.65
Eſ
-0.64
ſelves
-0.63
Diſ
-0.63
Theſe
-0.62
Efq
-0.61
poffe
-0.61
POSITIVE LOGITS
decided
0.98
décide
0.79
opted
0.79
chose
0.74
решили
0.74
began
0.74
decidió
0.73
went
0.72
特意
0.70
решила
0.70
Activations Density 0.453%