INDEX
Explanations
negative descriptors and terms related to criticism
New Auto-Interp
Negative Logits
is
-0.51
ia
-0.40
d
-0.32
ÙĬ
-0.31
us
-0.30
i
-0.29
t
-0.28
Ø©
-0.28
al
-0.28
e
-0.27
POSITIVE LOGITS
ror
0.17
othy
0.17
thern
0.16
bid
0.15
respond
0.15
theast
0.15
phan
0.14
بÛĮÙĨ
0.14
иÑĪ
0.14
rr
0.13
Activations Density 0.035%