INDEX
Explanations
phrases indicating influence or control over others
references to individuals and their interactions or roles
New Auto-Interp
Negative Logits
advertising
-0.79
Cosponsors
-0.71
\/\/
-0.70
ben
-0.69
coward
-0.68
GEN
-0.68
¥µ
-0.63
Lauder
-0.62
haps
-0.62
iple
-0.62
POSITIVE LOGITS
pegged
0.86
nailed
0.86
installed
0.85
figured
0.81
hooked
0.79
towed
0.78
implanted
0.77
sorted
0.75
authenticated
0.73
icum
0.73
Activations Density 0.397%