INDEX
Explanations
first-person pronouns and expressions of personal involvement or feelings
New Auto-Interp
Negative Logits
ansa
-0.16
iland
-0.15
509
-0.15
.libs
-0.14
adius
-0.14
omu
-0.14
sponsoring
-0.13
ny
-0.13
elper
-0.13
èĬĿ
-0.13
POSITIVE LOGITS
personally
0.20
himself
0.15
embros
0.15
itm
0.15
Daly
0.15
FORCE
0.15
rog
0.14
FORCE
0.14
åĢij
0.14
sv
0.14
Activations Density 0.143%