INDEX
    Explanations

    Britannica or Wikipedia

    New Auto-Interp
    Negative Logits
    y
    2.27
    are
    1.87
    د
    1.78
    ها
    1.77
    ер
    1.74
    ذ
    1.70
    رو
    1.68
    1.67
    1.66
    1.65
    POSITIVE LOGITS
     lush
    2.05
     flourish
    1.88
     exceed
    1.86
     illness
    1.86
     sull
    1.82
     aback
    1.78
    1.76
     porcelain
    1.75
     keyDown
    1.74
    很是
    1.73
    Act Density 0.001%

    No Known Activations