INDEX
    Explanations

    punctuation indicating speech or quotations

    New Auto-Interp
    Negative Logits
     PL
    -0.47
     tuyệt
    -0.44
    شمار
    -0.44
    ล่า
    -0.44
     λε
    -0.44
     censi
    -0.43
    dopodob
    -0.43
    ボル
    -0.42
    -0.42
     tens
    -0.42
    POSITIVE LOGITS
    ).”
    1.18
    )."
    1.17
    .’”
    1.16
    ?”
    1.12
    ?"
    1.11
    .'"
    1.09
    ."
    1.09
    .”
    1.08
    ’.”
    1.08
    ),"
    1.07
    Act Density 0.272%

    No Known Activations