INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     preschool
    -0.07
     insults
    -0.06
     patents
    -0.06
     ownership
    -0.06
     viewport
    -0.06
     withheld
    -0.06
    rede
    -0.06
     citizenship
    -0.06
    -0.06
     deviations
    -0.06
    POSITIVE LOGITS
     drama
    0.18
     Drama
    0.16
     dramas
    0.14
    rama
    0.11
     Dram
    0.09
     Blade
    0.07
    /gpl
    0.07
    .Dense
    0.07
    ドラ
    0.07
    rez
    0.07
    Act Density 0.004%

    No Known Activations