INDEX
    Explanations

    concerns or conflicts related to ethics, positions of authority, or potential wrongdoings

    New Auto-Interp
    Negative Logits
     adv
    -0.56
     Realms
    -0.54
    uster
    -0.54
    atical
    -0.53
    ionics
    -0.53
     viz
    -0.52
     innocuous
    -0.52
    ãĥ
    -0.51
    ibles
    -0.51
    isSpecialOrderable
    -0.50
    POSITIVE LOGITS
     similarly
    0.70
     similar
    0.67
    coni
    0.63
     these
    0.59
    velt
    0.58
    sylv
    0.58
     meanwhile
    0.56
    FontSize
    0.55
     unaffected
    0.55
     this
    0.54
    Act Density 1.213%

    No Known Activations