INDEX
Explanations
negative or descriptive emotional states and attributes
qualities and states
New Auto-Interp
Negative Logits
ſind
-0.98
-0.96
للاسماء
-0.95
Menſchen
-0.94
<unused41>
-0.94
<unused28>
-0.94
<unused47>
-0.94
<unused14>
-0.94
<unused79>
-0.94
[@BOS@]
-0.93
POSITIVE LOGITS
—
0.35
↵
0.27
news
0.27
...
0.26
↵↵
0.24
law
0.24
↵↵↵
0.23
<eos>
0.23
pri
0.23
${0.23
Activations Density 0.041%