INDEX
Explanations
sentences that indicate a lack of personal responsibility or avoidance of blame
New Auto-Interp
Negative Logits
гоÑĤ
-0.16
ast
-0.15
adel
-0.14
åĸľ
-0.14
ÃŃd
-0.14
elts
-0.14
iska
-0.14
jos
-0.14
-guard
-0.14
ORB
-0.13
POSITIVE LOGITS
ampo
0.17
UDGE
0.15
baum
0.15
_FILL
0.14
ombine
0.14
dorf
0.14
strup
0.14
isin
0.14
utow
0.14
leftright
0.13
Activations Density 0.529%