INDEX
    Explanations

    front or before in multiple languages

    New Auto-Interp
    Negative Logits
     пÑĢежде
    -0.11
     PRI
    -0.10
     Prior
    -0.10
     overhead
    -0.09
    athers
    -0.09
    antic
    -0.09
     buzz
    -0.09
     prior
    -0.09
    ereo
    -0.08
     Ned
    -0.08
    POSITIVE LOGITS
     devant
    0.54
     front
    0.47
     frente
    0.42
    front
    0.38
     ante
    0.30
     ìķŀ
    0.30
     пеÑĢед
    0.30
    åīį
    0.30
    _front
    0.29
     önünde
    0.28
    Act Density 0.134%

    No Known Activations