INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    σεις
    -0.79
    INAS
    -0.78
    těte
    -0.78
    lips
    -0.77
    𝔱
    -0.77
     obrá
    -0.77
    спери
    -0.76
    -0.75
    CardContent
    -0.75
    -0.74
    POSITIVE LOGITS
     sad
    4.78
     sadness
    3.69
    sad
    3.53
    Sad
    3.52
     Sad
    3.42
     saddened
    2.77
     saddest
    2.72
     SAD
    2.50
     triste
    2.38
    SAD
    2.31
    Act Density 0.031%

    No Known Activations