INDEX
Explanations
prepositions or conjunctions at the beginning of a phrase followed by a strong sentiment or action term towards the end of the phrase
concepts related to cause-and-effect relationships or outcomes
New Auto-Interp
Negative Logits
ounding
-0.85
abs
-0.74
isible
-0.71
vana
-0.70
egu
-0.70
ét
-0.69
ums
-0.68
imens
-0.68
aeda
-0.68
ophy
-0.68
POSITIVE LOGITS
they
0.99
it
0.84
opted
0.82
reverted
0.80
we
0.79
forgot
0.79
he
0.77
became
0.75
there
0.75
manages
0.75
Activations Density 0.466%