INDEX
Explanations
phrases indicating moral judgment or evaluation
New Auto-Interp
Negative Logits
since
-0.19
asm
-0.18
since
-0.18
pues
-0.17
considering
-0.17
seit
-0.16
awn
-0.15
Since
-0.15
unless
-0.15
Since
-0.14
POSITIVE LOGITS
because
0.48
Because
0.42
porque
0.41
because
0.40
Because
0.40
поÑĤомÑĥ
0.35
åĽłä¸º
0.34
ecause
0.33
omdat
0.33
parce
0.32
Activations Density 0.172%