INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Anything
    0.41
     planilla
    0.39
     proš
    0.39
     Anything
    0.38
     Shame
    0.38
    0.38
     anything
    0.36
    })}{
    0.36
    Denn
    0.36
    0.36
    POSITIVE LOGITS
     hom
    1.96
     Hom
    1.73
    Hom
    1.63
    hom
    1.55
     homo
    1.30
     HOM
    1.28
     homogen
    1.23
     homogeneous
    1.22
     homog
    1.20
     homogenous
    1.17
    Act Density 0.010%

    No Known Activations