INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    i
    0.84
    <bos>
    0.74
    m
    0.70
    Auf
    0.69
    мый
    0.67
    Nah
    0.66
    ă
    0.65
    Парт
    0.65
    l
    0.64
    дії
    0.63
    POSITIVE LOGITS
    0.87
     forehead
    0.84
    ˹
    0.84
     unsupervised
    0.84
     briefings
    0.82
     subconscious
    0.78
     Potatoes
    0.78
     variances
    0.78
     bookArray
    0.77
     goodness
    0.76
    Act Density 0.001%

    No Known Activations