INDEX
    Explanations

    phrases indicating moral judgment or evaluation

    New Auto-Interp
    Negative Logits
     since
    -0.19
    asm
    -0.18
    since
    -0.18
     pues
    -0.17
     considering
    -0.17
     seit
    -0.16
    awn
    -0.15
     Since
    -0.15
    unless
    -0.15
    Since
    -0.14
    POSITIVE LOGITS
     because
    0.48
     Because
    0.42
     porque
    0.41
    because
    0.40
    Because
    0.40
     поÑĤомÑĥ
    0.35
    åĽłä¸º
    0.34
    ecause
    0.33
     omdat
    0.33
     parce
    0.32
    Act Density 0.172%

    No Known Activations