INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ruit
    -0.07
    리고
    -0.07
    τουργ
    -0.06
    Your
    -0.06
     (
    -0.06
    таки
    -0.06
     intention
    -0.06
    ationale
    -0.06
     материал
    -0.06
    .Preference
    -0.06
    POSITIVE LOGITS
     Gundam
    0.07
    Fuck
    0.07
    0.06
    Shape
    0.06
     хол
    0.06
    0.06
    senal
    0.06
     مثلا
    0.06
    -di
    0.06
     респ
    0.06
    Act Density 0.021%

    No Known Activations