INDEX
    Explanations

    expressions of emotional vulnerability and shared experiences

    New Auto-Interp
    Negative Logits
    .”—
    -0.75
     although
    -0.69
     للاسماء
    -0.69
    __":
    
    -0.68
    __))
    -0.67
    )";
    
    -0.67
    ",&
    -0.66
    "):
    
    -0.66
     Wicidata
    -0.66
    الإنجليزية
    -0.65
    POSITIVE LOGITS
    ,
    1.27
     يتيمه
    0.48
     hjär
    0.47
    *,
    0.43
     ögon
    0.42
     ,
    0.42
    $,
    0.41
     فريبيس
    0.41
    .,
    0.41
    न्द्र
    0.40
    Act Density 0.542%

    No Known Activations