INDEX
Explanations
expressions of giving up or leaving situations
New Auto-Interp
Negative Logits
awy
-0.15
iyel
-0.14
berger
-0.14
rette
-0.14
itra
-0.13
Katz
-0.13
pieces
-0.13
ura
-0.13
ihan
-0.13
osome
-0.13
POSITIVE LOGITS
nor
0.25
indre
0.18
Nor
0.16
ani
0.16
Nor
0.16
slightest
0.15
nor
0.15
evi
0.15
bs
0.15
adir
0.15
Activations Density 0.211%