INDEX
Explanations
verbs related to desire or wanting
New Auto-Interp
Negative Logits
VERTISEMENT
-0.73
icol
-0.66
livious
-0.62
trust
-0.61
semble
-0.60
unny
-0.60
icist
-0.60
rir
-0.59
ulty
-0.58
errors
-0.57
POSITIVE LOGITS
revenge
0.91
reprene
0.90
to
0.87
answers
0.77
nothing
0.74
attention
0.74
clarification
0.73
assurances
0.72
someone
0.72
something
0.72
Activations Density 0.943%