INDEX
    Explanations

    phrases related to explaining the reasoning or motivation behind something

    phrases indicating motivation or reasoning

    New Auto-Interp
    Negative Logits
    ander
    -0.85
     Pwr
    -0.75
    issan
    -0.73
    cki
    -0.72
    aire
    -0.71
    idential
    -0.69
    istic
    -0.69
    ennes
    -0.69
    20439
    -0.67
    alam
    -0.66
    POSITIVE LOGITS
    âĸ¬âĸ¬
    0.74
     bars
    0.70
     behind
    0.68
     closed
    0.67
    plates
    0.67
     why
    0.66
    wards
    0.66
     WHY
    0.65
    closed
    0.65
    byn
    0.63
    Act Density 0.017%

    No Known Activations