INDEX
Explanations
phrases associated with personal beliefs or convictions
phrases indicating interpersonal relationships and actions
New Auto-Interp
Negative Logits
endiary
-0.83
interstitial
-0.76
prepar
-0.76
ancial
-0.73
urgical
-0.72
-0.69
everal
-0.68
earchers
-0.68
umerous
-0.67
urst
-0.67
POSITIVE LOGITS
me
1.56
yourself
1.51
yourselves
1.51
him
1.48
myself
1.33
us
1.31
HIM
1.29
ya
1.28
somebody
1.26
thee
1.25
Activations Density 0.725%