INDEX
Explanations
negative or critical sentiments
ex- titles and roles
New Auto-Interp
Negative Logits
majánló
-1.23
ロウィン
-1.22
<unused42>
-1.18
<unused43>
-1.18
<unused28>
-1.18
<unused8>
-1.18
<unused14>
-1.17
<unused16>
-1.17
<unused3>
-1.17
[@BOS@]
-1.17
POSITIVE LOGITS
former
0.86
Former
0.81
Former
0.77
ex
0.67
former
0.54
ex
0.54
ehemalige
0.53
eski
0.46
Ex
0.45
mantan
0.44
Activations Density 0.003%