INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     consum
    -0.07
    起こ
    -0.07
    福利
    -0.06
     intellect
    -0.06
    .pol
    -0.06
     Shel
    -0.06
    tiles
    -0.06
    awns
    -0.06
     Strings
    -0.06
     Kings
    -0.06
    POSITIVE LOGITS
    otten
    0.07
    unteers
    0.07
    Going
    0.06
    SAT
    0.06
     hiking
    0.06
     Number
    0.06
    ümüz
    0.06
    0.06
     served
    0.06
    0.06
    Act Density 0.108%

    No Known Activations