INDEX
Explanations
phrases related to opinions or stances on various issues
references to interpretations of moral codes and opinions about societal issues
New Auto-Interp
Negative Logits
Synopsis
-0.69
Called
-0.64
xtap
-0.64
word
-0.64
cour
-0.60
Adds
-0.60
Spoiler
-0.58
Whe
-0.58
Incre
-0.58
Upon
-0.58
POSITIVE LOGITS
merce
0.72
ļé
0.70
oneself
0.69
allery
0.67
actual
0.64
Fiat
0.62
gans
0.62
ohm
0.60
actual
0.60
others
0.60
Activations Density 0.772%