INDEX
Explanations
phrases related to dispute or conflict
phrases containing varying degrees of negativity or strong criticism
New Auto-Interp
Negative Logits
Gaul
-0.75
Rudd
-0.75
SAM
-0.73
Polk
-0.72
Nau
-0.69
Slug
-0.69
Monkey
-0.67
Doodle
-0.66
Hud
-0.65
Filter
-0.64
POSITIVE LOGITS
extremely
1.15
responsible
1.14
expected
1.12
reci
1.10
emb
1.09
treated
1.08
very
1.07
sufficient
1.05
absolutely
1.05
really
1.05
Activations Density 0.093%