INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rez
    -0.07
    kách
    -0.07
    ascular
    -0.07
     RTP
    -0.06
    vet
    -0.06
    writing
    -0.06
    كومة
    -0.06
    ailles
    -0.06
    _ber
    -0.06
     nutrit
    -0.06
    POSITIVE LOGITS
     candidate
    0.07
    ово
    0.06
    صد
    0.06
    persona
    0.06
     flee
    0.06
    FRAME
    0.05
    ][$
    0.05
     extremism
    0.05
    宿
    0.05
    ترنت
    0.05
    Act Density 0.000%

    No Known Activations