INDEX
Explanations
phrases related to political ideologies and societal concepts
statements related to societal issues and personal reflections
New Auto-Interp
Negative Logits
etheless
-0.57
]).
-0.55
DCS
-0.53
"))
-0.51
ãĤ´ãĥ³
-0.49
collect
-0.48
jiang
-0.48
issance
-0.48
èĢ
-0.46
Beir
-0.46
POSITIVE LOGITS
weaker
0.46
bounce
0.46
weak
0.44
however
0.44
horrible
0.43
machine
0.42
machines
0.41
knowing
0.41
sed
0.40
though
0.40
Activations Density 4.031%