INDEX
    Explanations

    critical assessments of involved people

    New Auto-Interp
    Negative Logits
     SOCIAL
    0.47
    𝒟
    0.42
     অত্যাচার
    0.42
     γίνεται
    0.40
     Sosial
    0.40
     Thereafter
    0.40
     対象
    0.40
    kadang
    0.40
     Νο
    0.39
    lava
    0.39
    POSITIVE LOGITS
     transpos
    0.50
     on
    0.44
     simplicity
    0.44
    ibility
    0.42
     salmon
    0.42
    ти
    0.42
    ेंसेस
    0.41
     brazen
    0.41
     gaze
    0.41
     circunst
    0.41
    Act Density 0.002%

    No Known Activations