INDEX
Explanations
emotionally charged and impactful words or phrases
abstract concepts related to morality, conflict, and the human experience
New Auto-Interp
Negative Logits
CHAT
-0.78
ificant
-0.74
ificantly
-0.66
zl
-0.62
APD
-0.62
azeera
-0.61
eatures
-0.60
ittees
-0.59
ersen
-0.59
oops
-0.58
POSITIVE LOGITS
ankind
0.85
lessness
0.78
emanating
0.68
thood
0.67
fulness
0.66
fame
0.65
itself
0.65
nesia
0.65
beard
0.64
ropy
0.63
Activations Density 0.466%