INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     studied
    -0.07
    建設
    -0.07
     Ke
    -0.07
    ْع
    -0.07
    Move
    -0.06
    .compiler
    -0.06
     kanıt
    -0.06
     Spare
    -0.06
    Sys
    -0.06
    ßer
    -0.06
    POSITIVE LOGITS
     Republic
    0.09
     republic
    0.09
    Republic
    0.08
    ublic
    0.08
     scrolls
    0.07
     mocking
    0.07
     bankrupt
    0.07
     Democracy
    0.07
     wreak
    0.06
     refuse
    0.06
    Act Density 0.007%

    No Known Activations