INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     content
    -0.82
    コンテンツ
    -0.77
    content
    -0.73
    Content
    -0.72
     قطاع
    -0.71
    atrième
    -0.71
    ceded
    -0.70
     Mounts
    -0.69
     сред
    -0.67
     المحت
    -0.66
    POSITIVE LOGITS
     type
    2.13
    type
    1.99
     Type
    1.98
    Type
    1.80
    TYPE
    1.73
     types
    1.71
     TYPE
    1.68
     Types
    1.53
    types
    1.53
    Types
    1.46
    Act Density 0.055%

    No Known Activations