INDEX
Explanations
phrases indicating an invitation for interaction or feedback
expressions of encouragement or invitation to engage
New Auto-Interp
Negative Logits
notch
-0.70
PBS
-0.67
Bastard
-0.66
Parenthood
-0.63
Mub
-0.63
misc
-0.61
Boko
-0.59
netflix
-0.58
Mun
-0.57
Lup
-0.57
POSITIVE LOGITS
INGS
0.79
ings
0.79
compelled
0.78
ername
0.78
comfortable
0.75
good
0.75
eline
0.73
free
0.73
pee
0.72
iven
0.72
Activations Density 0.043%