INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     모델
    -0.07
    Combo
    -0.06
     Conexion
    -0.06
     '=',
    -0.06
     раніше
    -0.06
    nten
    -0.06
     Sacramento
    -0.06
     topology
    -0.06
    ertoire
    -0.06
     importing
    -0.06
    POSITIVE LOGITS
    食品
    0.08
     RELATED
    0.07
    "]))
    0.07
     vib
    0.07
     shim
    0.06
     Hate
    0.06
    0.06
     dame
    0.06
     ancient
    0.06
    kon
    0.06
    Act Density 0.001%

    No Known Activations