INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     questa
    0.97
     linda
    0.93
     ouro
    0.91
     tịch
    0.90
     Prefeitura
    0.89
     planejamento
    0.89
     benessere
    0.89
     boa
    0.88
     piace
    0.88
     heen
    0.88
    POSITIVE LOGITS
    ر
    0.94
    ه‌ها
    0.88
    د
    0.85
    هه
    0.84
    ه‌های
    0.81
    ه‌ای
    0.81
    ب
    0.80
    FOM
    0.79
    ستا
    0.78
    ج
    0.78
    Act Density 0.001%

    No Known Activations