INDEX
Explanations
phrases indicative of guidance or advice in various contexts
New Auto-Interp
Negative Logits
pleaſure
-0.96
houſe
-0.90
itſelf
-0.90
ſelf
-0.89
himſelf
-0.85
poffible
-0.84
Jefus
-0.84
findpost
-0.84
setVerticalGroup
-0.82
purpoſe
-0.82
POSITIVE LOGITS
fucked
0.68
...
0.67
ça
0.66
…
0.66
nice
0.66
haha
0.65
shitty
0.65
stupidly
0.65
fucking
0.63
shit
0.62
Activations Density 0.030%