INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     aktiven
    -0.89
     erheben
    -0.87
    </blockquote>
    -0.87
     Titles
    -0.85
    ZZLE
    -0.85
    セス
    -0.84
    ுக
    -0.83
    O
    -0.81
     einzigen
    -0.81
     owed
    -0.79
    POSITIVE LOGITS
     pride
    1.51
    pride
    1.48
    Pride
    1.48
     Pride
    1.32
    Proud
    1.12
     bendera
    1.11
    parade
    1.09
    proud
    1.09
     гор
    1.07
     kalem
    1.01
    Act Density 0.006%

    No Known Activations