INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    issions
    -0.07
     inhibitor
    -0.07
     infection
    -0.07
    위원회
    -0.06
    ission
    -0.06
    ULL
    -0.06
    Full
    -0.06
     kernels
    -0.06
    ุมชน
    -0.06
    вол
    -0.06
    POSITIVE LOGITS
     stere
    0.18
     stereo
    0.12
     Ster
    0.10
     Stereo
    0.10
    Ster
    0.10
     stereotypes
    0.10
    ereo
    0.08
    ieri
    0.08
     stereotype
    0.08
    tere
    0.08
    Act Density 0.004%

    No Known Activations