INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     chief
    -0.07
    Hop
    -0.07
    _rule
    -0.07
     comerc
    -0.06
    obj
    -0.06
    ソン
    -0.06
     pitch
    -0.06
     interests
    -0.06
    homes
    -0.06
     footsteps
    -0.06
    POSITIVE LOGITS
     patriotism
    0.07
    podob
    0.07
     виготов
    0.06
     nelze
    0.06
    ifferent
    0.06
    _FOUND
    0.06
    /react
    0.06
    (크기
    0.06
     таким
    0.06
    ennifer
    0.06
    Act Density 0.018%

    No Known Activations