INDEX
Explanations
instances of criticism and its various forms in discussions
New Auto-Interp
Negative Logits
ÏĢει
-0.16
cribe
-0.15
ook
-0.15
ause
-0.14
ey
-0.14
grace
-0.14
itre
-0.14
ryo
-0.14
ent
-0.14
ente
-0.14
POSITIVE LOGITS
-minded
0.18
icism
0.17
atory
0.17
atically
0.17
izes
0.16
imler
0.16
kovi
0.15
minded
0.15
voices
0.15
antium
0.15
Activations Density 0.036%