INDEX
Explanations
phrases related to societal issues, opinions, and beliefs
phrases indicating consequences and opinions related to societal issues
New Auto-Interp
Negative Logits
è£ħ
-0.80
ONSORED
-0.75
ħĭ
-0.73
soDeliveryDate
-0.68
ovember
-0.67
uthor
-0.66
ãĤ´ãĥ³
-0.66
Hack
-0.65
URR
-0.63
TION
-0.62
POSITIVE LOGITS
whereas
0.79
regardless
0.77
they
0.72
THEY
0.71
selves
0.70
beware
0.70
They
0.69
often
0.69
sometimes
0.68
they
0.67
Activations Density 0.960%