INDEX
Explanations
verbs representing desires or preferences
expressions of desire or expectation
New Auto-Interp
Negative Logits
hement
-0.75
andy
-0.61
haz
-0.59
vati
-0.58
travel
-0.57
hero
-0.56
velt
-0.56
shoot
-0.56
lves
-0.55
stant
-0.55
POSITIVE LOGITS
happening
0.92
happen
0.82
beforehand
0.77
ABOUT
0.74
"},"
0.74
about
0.72
when
0.71
transpired
0.71
versus
0.70
happens
0.70
Activations Density 0.175%