INDEX
Explanations
punctuation marks, particularly quotes and parentheses
New Auto-Interp
Negative Logits
ÙĩÙĦ
-0.17
rights
-0.17
å¹¹
-0.14
selling
-0.14
lear
-0.14
noun
-0.14
reve
-0.14
!***
-0.14
çīĮ
-0.14
votes
-0.14
POSITIVE LOGITS
s
0.21
de
0.21
od
0.20
als
0.20
ese
0.19
anges
0.19
dy
0.19
ok
0.19
ses
0.18
be
0.18
Activations Density 0.040%