INDEX
    Explanations

    instances of the pronoun "we"

    New Auto-Interp
    Negative Logits
    noon
    -0.18
    andon
    -0.18
    rav
    -0.17
    åĢij
    -0.17
    mund
    -0.17
    umn
    -0.16
    wich
    -0.16
    semble
    -0.15
    water
    -0.15
    们
    -0.15
    POSITIVE LOGITS
    aves
    0.28
    aved
    0.26
    ighb
    0.24
    avings
    0.21
    aver
    0.21
    arily
    0.21
    arnings
    0.21
    evil
    0.20
    eded
    0.20
    eping
    0.20
    Act Density 0.020%

    No Known Activations