INDEX
    Explanations

    entities and terms related to authority, academia, and societal structure

    New Auto-Interp
    Negative Logits
    ,
    -0.49
     in
    -0.47
     and
    -0.44
     of
    -0.44
     a
    -0.42
     to
    -0.41
     
    -0.41
     (
    -0.41
    -
    -0.40
     on
    -0.40
    POSITIVE LOGITS
    _REF
    0.26
    ç§ģãģ®
    0.26
    ãģ®ãģĭ
    0.25
    èµĦæĸĻ
    0.25
    ãĤ¹ãģ®
    0.24
    éľĩ
    0.24
    ãĤĮãģ¦
    0.24
     درÛĮا
    0.23
    åľŁåľ°
    0.23
    ãĤ¤ãĤ¹
    0.23
    Act Density 0.060%

    No Known Activations