INDEX
Explanations
phrases indicating desire or intention
expressions of desire or intent
New Auto-Interp
Negative Logits
Fowler
-0.65
rir
-0.61
Notting
-0.61
Condition
-0.60
guiActiveUn
-0.59
rium
-0.58
ilial
-0.58
roach
-0.57
may
-0.57
MRI
-0.56
POSITIVE LOGITS
sake
0.68
honesty
0.67
answers
0.67
cleaned
0.64
clarity
0.63
honest
0.63
smoot
0.63
revenge
0.63
thood
0.63
rebuilt
0.62
Activations Density 0.163%