INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    leine
    -0.07
     Exposure
    -0.07
     Bunifu
    -0.06
     Su
    -0.06
     flotation
    -0.06
     Mik
    -0.06
     Girls
    -0.06
    <Product
    -0.06
    -su
    -0.06
     Hak
    -0.06
    POSITIVE LOGITS
     observed
    0.11
     surveillance
    0.07
     observe
    0.07
     noted
    0.07
    BX
    0.07
    ‌تر
    0.06
     discovered
    0.06
    -Semitism
    0.06
    benchmark
    0.06
    наруж
    0.06
    Act Density 0.018%

    No Known Activations