INDEX
Explanations
phrases related to controversy or opposition
markers of negative judgment or criticism
New Auto-Interp
Negative Logits
enta
-0.73
Bunny
-0.67
ctors
-0.65
Somerset
-0.65
unmarked
-0.63
bunny
-0.62
creen
-0.62
sled
-0.62
dig
-0.62
Manhattan
-0.61
POSITIVE LOGITS
âĹ¼
0.97
âĢł
0.90
0.86
§
0.84
âģ
0.81
ISIS
0.81
¯
0.80
0.78
¬
0.77
â
0.75
Activations Density 0.358%