INDEX
    Explanations

    concepts related to emotional complexity and self-awareness

    New Auto-Interp
    Negative Logits
     nakalista
    -0.86
    ViewFeatures
    -0.81
     bezeichneter
    -0.80
    /*
    -0.79
     Offisielt
    -0.77
    aarrggbb
    -0.77
    extAlignment
    -0.75
    MessageTagHelper
    -0.74
     calendriers
    -0.73
    explique
    -0.73
    POSITIVE LOGITS
     foolish
    0.43
     obicei
    0.42
    0.42
     foresight
    0.42
     enough
    0.41
    smart
    0.40
     αλ
    0.39
    начала
    0.39
    ziplin
    0.39
     dumb
    0.38
    Act Density 0.312%

    No Known Activations