INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Tu
    -1.19
    Tu
    -0.84
     tu
    -0.52
    ida
    -0.49
     di
    -0.48
     den
    -0.45
     su
    -0.45
     Regina
    -0.45
     dem
    -0.43
     der
    -0.42
    POSITIVE LOGITS
    0.92
     autorytatywna
    0.82
    mybatisplus
    0.78
     متعلقه
    0.77
     للاسماء
    0.77
     fhort
    0.76
     fubject
    0.76
    ſelves
    0.75
    ſelf
    0.75
    CloseOperation
    0.73
    Act Density 0.163%

    No Known Activations