INDEX
    Explanations

    mentions of "pros" and to a lesser extent "cons" in various contexts

    New Auto-Interp
    Negative Logits
    endon
    -0.18
    een
    -0.16
    cene
    -0.15
    ÃĹ↵↵
    -0.15
    et
    -0.15
    utschen
    -0.14
    лиз
    -0.14
    uations
    -0.14
    ernote
    -0.14
    emand
    -0.14
    POSITIVE LOGITS
    pective
    0.29
    pects
    0.26
    pector
    0.26
    ively
    0.23
     pros
    0.19
    acco
    0.18
    Pros
    0.18
    peri
    0.17
    pectives
    0.17
    ely
    0.17
    Act Density 0.012%

    No Known Activations