INDEX
Explanations
phrases related to social responsibility and accountability
sentences that express key conclusions or statements
New Auto-Interp
Negative Logits
ruin
-0.86
desper
-0.85
pretended
-0.82
yip
-0.81
horrend
-0.80
pse
-0.80
undermin
-0.79
slightest
-0.78
ugly
-0.78
sucker
-0.77
POSITIVE LOGITS
Additionally
1.48
Through
1.44
Together
1.32
Currently
1.26
Accordingly
1.25
Through
1.25
Specifically
1.19
Throughout
1.19
Beginning
1.18
Learn
1.18
Activations Density 0.362%