INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    4
    0.38
    ك
    0.38
    5
    0.35
    (“
    0.34
     fehlen
    0.33
    (
    0.33
    ած
    0.32
    )
    0.32
    0.32
    ?)
    0.32
    POSITIVE LOGITS
     rid
    0.59
     on
    0.55
     get
    0.45
     into
    0.44
     acquainted
    0.44
     люди
    0.41
     to
    0.41
     people
    0.41
     bogged
    0.38
     excited
    0.38
    Act Density 0.073%

    No Known Activations