INDEX
Explanations
expressions of desire and requests for action
New Auto-Interp
Negative Logits
UnusedPrivate
-0.73
Personendaten
-0.72
numerus
-0.65
]))
-0.61
lenker
-0.60
'')
-0.58
(!__
-0.58
ural
-0.57
(""))-0.57
'*')
-0.57
POSITIVE LOGITS
revenge
0.69
quit
0.65
retali
0.65
HasForeignKey
0.63
fight
0.61
hurry
0.59
rejoin
0.58
revenge
0.58
stop
0.57
confess
0.56
Activations Density 0.317%