INDEX
Explanations
phrases that convey excuses or justifications for actions
New Auto-Interp
Negative Logits
ILLE
-0.17
åĪij
-0.16
iano
-0.15
ILED
-0.15
illac
-0.14
mort
-0.14
isay
-0.14
renc
-0.14
angelo
-0.14
(#)
-0.13
POSITIVE LOGITS
excuse
0.28
excuses
0.25
justification
0.19
justify
0.18
blame
0.17
ett
0.16
itu
0.16
du
0.15
justify
0.15
ึà¸ģ
0.15
Activations Density 0.215%