INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Hük
    -0.07
     Apparel
    -0.06
    Coding
    -0.06
    律宾
    -0.06
     hodnocení
    -0.06
    Acceleration
    -0.06
     Florence
    -0.06
    .PLL
    -0.06
     averages
    -0.06
    жди
    -0.06
    POSITIVE LOGITS
    ist
    0.14
    IST
    0.13
     nationalist
    0.11
    unist
    0.10
     Socialist
    0.10
    ists
    0.10
    hist
    0.09
    ista
    0.09
     feminist
    0.09
    iste
    0.09
    Act Density 0.024%

    No Known Activations