INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hoax
    0.80
    TP
    0.79
     postpon
    0.76
     n
    0.75
    0.75
     jab
    0.75
     TP
    0.74
     haw
    0.73
     kore
    0.72
    acak
    0.72
    POSITIVE LOGITS
    0.96
    0.93
    ش
    0.91
    බැ
    0.90
    រស
    0.88
    يز
    0.88
    0.87
     projetos
    0.85
    0.84
    лни
    0.83
    Act Density 0.001%

    No Known Activations