INDEX
Explanations
phrases containing the word "want" with various levels of intensity
expressions of desire and intention
New Auto-Interp
Negative Logits
NVIDIA
-0.62
ilings
-0.61
livious
-0.58
ulty
-0.58
illian
-0.57
obser
-0.57
ielding
-0.57
icol
-0.56
hesis
-0.55
usting
-0.55
POSITIVE LOGITS
to
1.10
revenge
1.03
nothing
0.90
only
0.87
vengeance
0.86
answers
0.80
permission
0.75
to
0.75
reprene
0.75
something
0.73
Activations Density 0.111%