INDEX
    Explanations

    phrases indicating contrast or exception

    New Auto-Interp
    Negative Logits
    cken
    -0.16
    eteria
    -0.15
    erk
    -0.15
    â̦"↵↵
    -0.15
    heimer
    -0.14
    çĸij
    -0.14
    ëĿ½
    -0.14
    вÑĸлÑĮ
    -0.14
    εÏĦ
    -0.14
    olt
    -0.14
    POSITIVE LOGITS
     being
    0.19
     knowing
    0.18
     it
    0.16
    otic
    0.16
     its
    0.16
     which
    0.15
     fact
    0.15
     Ged
    0.15
    ;
    0.15
    edy
    0.14
    Act Density 0.025%

    No Known Activations