INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _r
    -0.07
     követ
    -0.07
    inh
    -0.07
    _ing
    -0.07
    ugan
    -0.07
     unpopular
    -0.07
     personality
    -0.07
     Burst
    -0.07
     Xen
    -0.07
    endant
    -0.07
    POSITIVE LOGITS
     counted
    0.10
     fewer
    0.09
     প্রতিষ্ঠ
    0.08
     weniger
    0.08
    QUOTE
    0.08
     foydalan
    0.08
     einzige
    0.08
     празднич
    0.08
    oncé
    0.08
    أس
    0.08
    Act Density 0.012%

    No Known Activations