INDEX
    Explanations

    specific words or phrases that indicate personal names or notable entities

    New Auto-Interp
    Negative Logits
    ka
    -0.32
    li
    -0.25
    che
    -0.25
    pro
    -0.24
    ben
    -0.24
    bo
    -0.24
    nya
    -0.23
    be
    -0.23
    la
    -0.23
    tr
    -0.22
    POSITIVE LOGITS
    Ùĭ
    0.28
    ught
    0.25
    eus
    0.25
    'nın
    0.25
    ’nın
    0.25
    issance
    0.23
    frica
    0.22
    irement
    0.22
    ugh
    0.21
    ughter
    0.21
    Act Density 0.902%

    No Known Activations