INDEX
Explanations
occurrences of tense shifts or actions related to decision-making
New Auto-Interp
Negative Logits
inger
-0.17
ingers
-0.14
айд
-0.14
wherever
-0.14
Recovered
-0.14
heim
-0.14
UCT
-0.14
inx
-0.13
uhan
-0.13
idential
-0.13
POSITIVE LOGITS
fallback
0.18
frustration
0.16
orra
0.16
resort
0.16
æĸ¹æ¡Ī
0.16
despair
0.15
frustrated
0.15
ibar
0.15
失败
0.15
ery
0.15
Activations Density 0.398%