INDEX
Explanations
expressions of desires and goals
expressions of desire or intent
New Auto-Interp
Negative Logits
asse
-0.71
rir
-0.67
frac
-0.66
icol
-0.64
icist
-0.64
NVIDIA
-0.61
scar
-0.59
ashing
-0.59
iverpool
-0.59
illian
-0.57
POSITIVE LOGITS
reprene
0.94
everyone
0.82
everybody
0.81
to
0.80
answers
0.78
someone
0.76
somebody
0.76
revenge
0.76
clarification
0.74
clarity
0.72
Activations Density 0.091%