INDEX
Explanations
fragmented sentences that discuss various opinions and arguments
statements regarding intellectual honesty and the consequences of ignoring suggestions
New Auto-Interp
Negative Logits
Located
-0.72
Bang
-0.71
Bang
-0.68
culosis
-0.63
Built
-0.62
enery
-0.62
Located
-0.61
é£
-0.61
iage
-0.60
ãĢį
-0.60
POSITIVE LOGITS
cynicism
1.02
rhetorical
0.89
disingen
0.88
paraph
0.84
nonetheless
0.80
plaus
0.80
hypocrisy
0.79
disclaim
0.78
cynical
0.78
rhet
0.77
Activations Density 1.355%