INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    нения
    -0.08
    lander
    -0.07
     yi
    -0.07
    rai
    -0.07
    ASSE
    -0.07
    inator
    -0.07
    Assembly
    -0.07
    peech
    -0.07
    division
    -0.07
    ardag
    -0.07
    POSITIVE LOGITS
     basis
    0.10
     આધાર
    0.09
     आधारित
    0.09
     основе
    0.09
     Basis
    0.09
    Basis
    0.09
    _basis
    0.09
     ভিত্ত
    0.09
    циями
    0.08
     whim
    0.08
    Act Density 0.030%

    No Known Activations