INDEX
Explanations
terms related to strong emotional reactions and specific biological processes
topics related to health, social issues, and environmental concerns
New Auto-Interp
Negative Logits
?)
-0.59
with
-0.52
ivating
-0.52
lance
-0.51
odied
-0.50
?),
-0.50
encia
-0.47
?)
-0.47
ZA
-0.47
feat
-0.47
POSITIVE LOGITS
.</
0.81
.:
0.77
.�
0.74
.#
0.73
.<
0.69
.''
0.69
.'
0.69
.*
0.67
.","
0.67
%.
0.63
Activations Density 0.890%