INDEX
Explanations
phrases related to impactful statements or actions
phrases associated with impactful statements or criticisms
New Auto-Interp
Negative Logits
nces
-1.16
rics
-0.73
imester
-0.71
lez
-0.71
uay
-0.70
xual
-0.70
rified
-0.69
rio
-0.69
roups
-0.67
alg
-0.66
POSITIVE LOGITS
iceberg
1.04
proverbial
0.97
coffin
0.86
Archdemon
0.65
Coffin
0.65
onion
0.65
pear
0.63
sund
0.63
colonial
0.62
Onion
0.60
Activations Density 0.146%