INDEX
Explanations
instances of people expressing opinions or making claims
New Auto-Interp
Negative Logits
sburg
-0.22
yesterday
-0.21
ges
-0.21
earlier
-0.19
бÑĭ
-0.19
s
-0.18
ries
-0.17
ly
-0.17
sb
-0.17
elik
-0.16
POSITIVE LOGITS
äºĨä¸Ģ
0.18
äºĨ
0.18
(ed
0.17
asion
0.16
oron
0.16
able
0.15
indr
0.15
erved
0.15
ouve
0.15
_typeof
0.14
Activations Density 0.053%