INDEX
    Explanations

    phrases related to critical questioning and skepticism towards conventions and authority

    New Auto-Interp
    Negative Logits
    ardin
    -0.16
    illin
    -0.16
    ubern
    -0.15
     ??
    -0.14
    pps
    -0.14
    marvin
    -0.14
    ufig
    -0.14
    ìĬ¤íħĮ
    -0.14
    oro
    -0.14
     dic
    -0.14
    POSITIVE LOGITS
     à¤ĩतन
    0.15
    rint
    0.15
     Rem
    0.15
    екÑĤ
    0.14
    اعÙĬ
    0.14
    ahu
    0.14
    dsp
    0.14
     ÏĦÏĮÏĥο
    0.14
    LOOR
    0.13
    noun
    0.13
    Act Density 0.145%

    No Known Activations