INDEX
Explanations
excuses and justifications for actions or behaviors, particularly in relation to blame or justification
New Auto-Interp
Negative Logits
.opensource
-0.16
isay
-0.16
aises
-0.15
eldon
-0.15
egra
-0.14
íģ
-0.14
ÅĻe
-0.14
ÑĨенÑĤÑĢа
-0.14
fred
-0.14
oÅĽci
-0.14
POSITIVE LOGITS
oton
0.16
tup
0.16
Ulus
0.16
excuse
0.16
jit
0.15
éģĵ
0.15
arguments
0.15
arg
0.15
ulu
0.15
justification
0.14
Activations Density 0.126%