INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     either
    -1.40
     both
    -1.30
     mindestens
    -1.21
     Either
    -1.20
     it
    -1.19
    不管是
    -1.18
     Eigentü
    -1.17
    It
    -1.13
     there
    -1.13
    Either
    -1.11
    POSITIVE LOGITS
     gani
    1.39
     kampen
    1.38
     nabi
    1.37
    ߋ
    1.35
     egens
    1.27
     klima
    1.26
    1.26
     how
    1.23
     nivå
    1.22
     egent
    1.22
    Act Density 0.076%

    No Known Activations