INDEX
    Explanations

    lists or asks questions

    high-frequency function words and structural/formatting tokens (e.g., articles, prepositions, modals, punctuation, and control/section markers).

    New Auto-Interp
    Negative Logits
    あくまで
    0.29
     subtlety
    0.27
     intégr
    0.27
     Anzahl
    0.26
     playmaker
    0.26
     červ
    0.25
     மொத்தம்
    0.25
     Mutations
    0.25
    wechsl
    0.24
     Holds
    0.24
    POSITIVE LOGITS
    риа
    0.26
    様専用
    0.25
    ларда
    0.25
    ῶν
    0.24
    MENTS
    0.23
    ेलर
    0.23
    ιου
    0.23
    اريات
    0.23
    다가
    0.23
    ικού
    0.23
    Act Density 0.556%

    No Known Activations