INDEX
    Explanations

    numerical values and references to case citations

    New Auto-Interp
    Negative Logits
     WL
    -0.58
    WL
    -0.53
     mas
    -0.50
     Ant
    -0.47
     R
    -0.47
     and
    -0.44
     des
    -0.44
    -0.44
     r
    -0.43
    ow
    -0.43
    POSITIVE LOGITS
     الحره
    0.99
     незавершена
    0.85
     Савезне
    0.81
    InitVars
    0.79
     avoient
    0.79
     nahilalakip
    0.79
    twimg
    0.77
     étoit
    0.75
     Cæsar
    0.75
     itſelf
    0.75
    Act Density 0.004%

    No Known Activations