INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     їх
    -1.35
    kehren
    -1.33
    -1.32
     gogh
    -1.31
    Then
    -1.31
    -1.30
    𝘎
    -1.29
     skak
    -1.28
     estekak
    -1.27
    -1.27
    POSITIVE LOGITS
     It
    1.48
     doesn
    1.42
    你和
    1.27
     nobody
    1.24
     somebody
    1.23
    ůž
    1.21
     nothing
    1.20
     According
    1.20
     it
    1.18
    急忙
    1.18
    Act Density 0.033%

    No Known Activations