INDEX
Explanations
words and phrases related to providing assistance, support, and care for others
actions related to influencing or controlling others
New Auto-Interp
Negative Logits
recorded
-0.64
ãĥ¼ãĥĨãĤ£
-0.63
vertisement
-0.62
teasp
-0.61
vertising
-0.60
ÃĥÃĤ
-0.59
ãĥ¼ãĥĨ
-0.59
ogg
-0.59
luster
-0.57
ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
-0.57
POSITIVE LOGITS
thee
0.73
yourselves
0.71
oneself
0.70
)?
0.64
alot
0.63
rapists
0.62
yourself
0.62
him
0.62
Rs
0.61
me
0.59
Activations Density 0.849%