INDEX
Explanations
first-person pronouns indicating personal experiences and opinions
New Auto-Interp
Negative Logits
lix
-0.15
lobe
-0.14
HEMA
-0.14
;element
-0.14
cdb
-0.14
oredProcedure
-0.14
Await
-0.14
зави
-0.14
spender
-0.13
plet
-0.13
POSITIVE LOGITS
ronic
0.27
mean
0.25
ck
0.23
.e
0.22
even
0.20
e
0.20
kid
0.19
bet
0.18
joke
0.18
even
0.18
Activations Density 0.184%