INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.09
    úmero
    -0.08
     Become
    -0.08
    -final
    -0.08
     atrav
    -0.08
    -0.08
     entw
    -0.08
    513
    -0.07
     باس
    -0.07
    -faced
    -0.07
    POSITIVE LOGITS
     కోర
    0.10
    criteria
    0.10
     liking
    0.10
     gostar
    0.10
    Criteria
    0.10
     criteria
    0.09
     нравится
    0.09
     kriter
    0.09
     desider
    0.09
    desired
    0.09
    Act Density 0.043%

    No Known Activations