INDEX
Explanations
phrases related to motives or reasons for actions
arguments related to motivations and reasons behind decisions or actions
New Auto-Interp
Negative Logits
isode
-0.92
iard
-0.90
»Ĵ
-0.90
elta
-0.87
byter
-0.81
inis
-0.77
Cover
-0.75
ÙĴ
-0.75
enaries
-0.75
代
-0.74
POSITIVE LOGITS
sheer
1.38
inexper
1.37
curiosity
1.35
nostalgia
1.35
impat
1.30
jealousy
1.29
boredom
1.28
desire
1.27
ignorance
1.26
fear
1.26
Activations Density 0.324%