INDEX
    Explanations

    phrases indicating the existence or continuation of established ideas or trends

    New Auto-Interp
    Negative Logits
    onis
    -0.21
    ogie
    -0.16
    rahim
    -0.16
    vro
    -0.15
    cu
    -0.15
    lÃŃ
    -0.15
    ONO
    -0.14
     Gow
    -0.14
    beg
    -0.14
    .bs
    -0.14
    POSITIVE LOGITS
     nothing
    0.43
    nothing
    0.35
     Nothing
    0.34
     NOTHING
    0.32
    Nothing
    0.30
     surprise
    0.28
     Surprise
    0.28
     nichts
    0.26
     novel
    0.24
     ниÑĩего
    0.24
    Act Density 0.119%

    No Known Activations