INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    orus
    -0.56
     Calder
    -0.50
    RTCF
    -0.50
    uf
    -0.46
     للمعارف
    -0.44
     sensitive
    -0.43
    <eos>
    -0.41
    })]
    -0.40
     Zero
    -0.39
     هش
    -0.39
    POSITIVE LOGITS
     ―――――
    0.77
    ization
    0.73
    ized
    0.69
    eiro
    0.65
    وأضاف
    0.65
    izations
    0.63
    DockStyle
    0.63
    izers
    0.63
     ――――――――
    0.63
    s
    0.62
    Act Density 0.023%

    No Known Activations