INDEX
    Explanations

    phrases indicating novelty or uniqueness

    New Auto-Interp
    Negative Logits
    á»ijt
    -0.20
     Actual
    -0.18
    ê´
    -0.15
    ameleon
    -0.14
    ãĤį
    -0.14
    itte
    -0.14
     [~,
    -0.14
    Actual
    -0.14
    zeug
    -0.14
    ลà¸ĩ
    -0.14
    POSITIVE LOGITS
     previously
    0.34
     elsewhere
    0.31
     otherwise
    0.31
     previous
    0.26
    seen
    0.25
     Previously
    0.25
     seen
    0.24
     else
    0.24
    otherwise
    0.24
     before
    0.23
    Act Density 0.094%

    No Known Activations