INDEX
Explanations
negative aspects or attributes associated with concepts
descriptions of origin or type
New Auto-Interp
Negative Logits
-0.37
the
-0.35
The
-0.34
2
-0.32
May
-0.32
to
-0.32
Jurí
-0.31
Bedürfn
-0.31
You
-0.31
1
-0.31
POSITIVE LOGITS
queſta
0.79
<unused28>
0.76
[@BOS@]
0.76
<unused51>
0.76
<unused14>
0.76
<unused8>
0.76
<unused16>
0.76
<unused79>
0.76
<unused3>
0.75
<unused23>
0.75
Activations Density 0.055%