INDEX
Explanations
positive informal descriptions
New Auto-Interp
Negative Logits
fucking
0.72
fucked
0.66
fuck
0.60
Fuck
0.60
fuck
0.55
manifestly
0.54
Fuck
0.53
shit
0.52
bullshit
0.51
projective
0.50
POSITIVE LOGITS
kiddos
0.85
BFF
0.70
veggies
0.65
hubby
0.61
groovy
0.61
bling
0.60
strutt
0.58
cudd
0.58
celebs
0.58
foodie
0.58
Activations Density 0.078%