INDEX
Explanations
sentences containing personal comments or reflections
expressions of personal sentiments or reflections on societal issues
New Auto-Interp
Negative Logits
secret
-0.66
\-
-0.64
amon
-0.61
speculation
-0.60
sbm
-0.59
indal
-0.58
unknown
-0.57
disastrous
-0.56
unknown
-0.56
devastating
-0.55
POSITIVE LOGITS
sanity
1.11
decency
1.08
sane
1.04
respect
1.04
calmed
0.98
honesty
0.93
unbiased
0.93
cknow
0.93
honest
0.92
sensible
0.90
Activations Density 1.540%