INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     his
    0.38
    0.37
     bureaucrats
    0.37
     intellectuals
    0.36
     profitability
    0.36
     Gün
    0.36
     Michael
    0.36
     Spitzen
    0.36
     trzeba
    0.35
     B
    0.35
    POSITIVE LOGITS
     देखील
    0.53
    也可以
    0.51
     सुद्धा
    0.50
    0.50
    Pokud
    0.49
     ايضا
    0.48
    ר
    0.48
    াস
    0.47
     కూడా
    0.47
     tambien
    0.46
    Act Density 0.310%

    No Known Activations