INDEX
Explanations
instances of contrasting opinions or evaluations
New Auto-Interp
Negative Logits
etail
-0.18
icone
-0.17
mony
-0.16
riday
-0.16
ÏĨÏħ
-0.15
ấn
-0.15
FINE
-0.14
mj
-0.14
ledge
-0.14
agine
-0.14
POSITIVE LOGITS
despite
0.24
overall
0.19
Overall
0.18
although
0.18
Despite
0.17
reviewers
0.17
spite
0.17
Parts
0.17
parts
0.16
Although
0.16
Activations Density 0.020%