INDEX
    Explanations

    instances of decision-making language

    New Auto-Interp
    Negative Logits
     ricev
    -0.60
    ęg
    -0.59
    thru
    -0.57
    Quoting
    -0.55
     Abp
    -0.55
    Promoting
    -0.53
    idespread
    -0.53
     obsługi
    -0.52
    zzino
    -0.52
    rouge
    -0.51
    POSITIVE LOGITS
     Decide
    1.17
     DECISION
    1.16
    Decide
    1.14
     Decided
    1.13
     decides
    1.11
     deciding
    1.11
     Decisions
    1.11
     decide
    1.11
     decisions
    1.11
     Decision
    1.11
    Act Density 0.185%

    No Known Activations