INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     colorful
    0.65
    الة
    0.65
     Lauder
    0.63
     flaky
    0.63
     appreciable
    0.63
     abbreviated
    0.62
     disruptive
    0.61
     colourful
    0.61
     дві
    0.61
     pressurized
    0.60
    POSITIVE LOGITS
    en
    0.92
    i
    0.85
    0.80
     사항
    0.80
    ی
    0.78
    nach
    0.78
     nesse
    0.75
    事項
    0.74
    াভাব
    0.74
    0.74
    Act Density 0.042%

    No Known Activations