INDEX
    Explanations

    mentions of authority figures and their interactions with others

    New Auto-Interp
    Negative Logits
    Tikang
    -0.85
    :✨
    -0.72
     ویکی‌پدی
    -0.63
    исленность
    -0.62
    Hentet
    -0.62
     autorytatywna
    -0.61
     snippetHide
    -0.60
    -0.60
    setVerticalGroup
    -0.59
    InputTagHelper
    -0.59
    POSITIVE LOGITS
     pupils
    0.39
     pupil
    0.32
     character
    0.31
     Kot
    0.29
     موا
    0.29
    gebiete
    0.28
    k
    0.28
     Opfer
    0.28
     camar
    0.28
     comic
    0.28
    Act Density 0.678%

    No Known Activations