INDEX
    Explanations

    reasons or explanations for specific situations

    mentions of "reasons" in various contexts

    New Auto-Interp
    Negative Logits
    yss
    -0.70
    oba
    -0.67
    enged
    -0.67
    ibaba
    -0.66
    ILA
    -0.64
    boro
    -0.62
     tatt
    -0.60
     likeness
    -0.59
    aila
    -0.59
    ascus
    -0.59
    POSITIVE LOGITS
     reasons
    1.06
    abl
    0.94
     unrelated
    0.90
     why
    0.87
     Reasons
    0.82
    ¶
    0.81
    asons
    0.76
     alone
    0.75
    why
    0.74
    Reason
    0.74
    Act Density 0.028%

    No Known Activations