INDEX
    Explanations

    importance followed by clauses

    New Auto-Interp
    Negative Logits
    Detail
    0.37
     southern
    0.36
    "></
    0.35
     brunes
    0.34
    GD
    0.33
    нут
    0.32
    并将
    0.32
    Sav
    0.32
     vět
    0.32
     testaceis
    0.31
    POSITIVE LOGITS
     bahwa
    0.79
     أن
    0.77
     να
    0.68
     أنه
    0.65
     that
    0.65
     ότι
    0.64
     dass
    0.62
     että
    0.61
     rằng
    0.61
     bahawa
    0.59
    Act Density 0.022%

    No Known Activations