INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     للمعارف
    -0.93
     мәкал
    -0.91
     незавершена
    -0.87
     tartalomajánló
    -0.81
    TagMode
    -0.79
     $_(
    -0.78
     queſta
    -0.77
     transfieras
    -0.77
     autorytatywna
    -0.75
     indígen
    -0.73
    POSITIVE LOGITS
    :
    0.43
    ,
    0.43
    сток
    0.42
     used
    0.40
     fark
    0.39
     deles
    0.38
    ;
    0.38
     (
    0.37
    .
    0.36
    '
    0.36
    Act Density 0.014%

    No Known Activations