INDEX
    Explanations

    phrases that describe comparison and contrasting situations

    New Auto-Interp
    Negative Logits
    essen
    -0.19
    uj
    -0.16
    835
    -0.15
    ehr
    -0.15
    rike
    -0.15
    ochond
    -0.15
    oleon
    -0.15
    emmel
    -0.15
    nici
    -0.14
    akit
    -0.14
    POSITIVE LOGITS
     ones
    0.30
    ones
    0.16
     Ones
    0.16
    ãģIJ
    0.14
    lik
    0.14
     Dit
    0.14
     dit
    0.14
    lico
    0.14
    ãģĿãĤĮãģ¯
    0.14
     Fav
    0.13
    Act Density 0.149%

    No Known Activations