INDEX
    Explanations

    questions and short answers

    New Auto-Interp
    Negative Logits
    告诉我
    0.51
     অবহিত
    0.42
     told
    0.40
     성공
    0.40
     dichos
    0.39
     ок
    0.39
    这意味着
    0.38
    ΤΑ
    0.37
     ok
    0.37
     potwier
    0.37
    POSITIVE LOGITS
    短い
    0.59
     питання
    0.57
     вопро
    0.56
     корот
    0.55
    0.54
     Spoiler
    0.54
     ngắn
    0.53
    Short
    0.53
     short
    0.52
    Spoiler
    0.52
    Act Density 0.013%

    No Known Activations