INDEX
Explanations
references to collective actions or community statements
New Auto-Interp
Negative Logits
argo
-0.17
reachable
-0.15
ylie
-0.15
.icons
-0.15
Favor
-0.15
elper
-0.14
suming
-0.14
plat
-0.14
butt
-0.14
AFX
-0.14
POSITIVE LOGITS
couldn
0.25
couldn
0.23
Couldn
0.23
are
0.20
feel
0.19
proud
0.19
congr
0.19
happy
0.18
sÄħ
0.18
be
0.18
Activations Density 0.103%