INDEX
Explanations
emotional descriptors and adjectives related to various societal issues
New Auto-Interp
Negative Logits
own
-0.17
Enhancement
-0.16
Own
-0.15
ka
-0.14
qual
-0.14
another
-0.14
modification
-0.14
a
-0.14
ÙĪØ±Ø§
-0.14
atch
-0.14
POSITIVE LOGITS
nature
0.45
nature
0.38
ness
0.27
Nature
0.26
Nature
0.25
confines
0.23
aspect
0.23
aspects
0.22
æĢ§
0.22
majority
0.21
Activations Density 0.653%