INDEX
Explanations
words related to desires and intentions
New Auto-Interp
Negative Logits
VERTISEMENT
-0.64
icol
-0.64
anches
-0.61
livious
-0.60
icist
-0.59
semble
-0.58
trust
-0.58
rir
-0.58
ilian
-0.58
bug
-0.57
POSITIVE LOGITS
reprene
0.91
to
0.89
revenge
0.82
answers
0.76
nothing
0.72
assurances
0.72
someone
0.69
somebody
0.69
only
0.68
attention
0.68
Activations Density 0.444%