INDEX
Explanations
keywords related to ownership or possession
references to collective ownership or involvement
New Auto-Interp
Negative Logits
puff
-0.78
tar
-0.71
netflix
-0.71
bender
-0.69
more
-0.69
atican
-0.68
Ambro
-0.67
wic
-0.66
Goes
-0.65
urt
-0.64
POSITIVE LOGITS
selves
1.31
own
1.29
respective
1.01
beloved
0.94
adversaries
0.92
adversary
0.91
collective
0.88
asses
0.88
newest
0.88
motto
0.86
Activations Density 0.125%