INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Sil
    -0.08
    $$$$
    -0.07
     아니라
    -0.07
    jer
    -0.07
    Arg
    -0.07
    sist
    -0.07
     MIL
    -0.06
    Dict
    -0.06
    Sol
    -0.06
    Count
    -0.06
    POSITIVE LOGITS
    annonce
    0.08
    ainting
    0.08
    (formatter
    0.07
    展馆
    0.07
    0.07
    (copy
    0.07
     admission
    0.07
     pleased
    0.07
    IVAL
    0.07
    _fk
    0.07
    Act Density 0.054%

    No Known Activations