INDEX
    Explanations

    phrases related to justifications and explanations

    New Auto-Interp
    Negative Logits
    providedIn
    -0.62
     Talis
    -0.55
     Elis
    -0.46
     ModelExpression
    -0.44
    principalTable
    -0.44
    uVar
    -0.44
    IERC
    -0.43
     manageable
    -0.43
    writerow
    -0.43
    Portail
    -0.42
    POSITIVE LOGITS
     reasons
    1.56
     reason
    1.52
     Reasons
    1.43
    Reasons
    1.41
     why
    1.40
    reasons
    1.31
    Reason
    1.27
     Gründe
    1.26
     Reason
    1.23
     REASONS
    1.22
    Act Density 0.473%

    No Known Activations