INDEX
    Explanations

    questions and expressions of hope or concern

    New Auto-Interp
    Negative Logits
     aren
    -0.18
     Isn
    -0.17
     wasn
    -0.17
     isn
    -0.16
     only
    -0.16
     zwar
    -0.15
     haven
    -0.15
    .look
    -0.14
     lÃł
    -0.14
     There
    -0.14
    POSITIVE LOGITS
     happens
    0.23
     happened
    0.20
     seper
    0.19
     we
    0.18
     drew
    0.17
    Separ
    0.17
     separates
    0.16
     got
    0.16
     Separ
    0.16
     kept
    0.16
    Act Density 0.085%

    No Known Activations