INDEX
Explanations
words related to expressing thoughts and opinions
expressions of personal agency and emotional experiences
New Auto-Interp
Negative Logits
themselves
-0.67
apiece
-0.64
respectively
-0.63
idates
-0.60
Their
-0.57
tariffs
-0.51
turnover
-0.51
Us
-0.50
idges
-0.49
populous
-0.48
POSITIVE LOGITS
myself
1.88
my
1.39
My
0.89
MY
0.86
my
0.83
My
0.83
blogging
0.74
mine
0.74
am
0.70
I
0.67
Activations Density 1.211%