INDEX
    Explanations

    words indicating limitation or exclusivity

    New Auto-Interp
    Negative Logits
    contri
    -0.18
    ownership
    -0.15
    zan
    -0.14
    anner
    -0.14
    anners
    -0.13
    oggles
    -0.13
    ancel
    -0.13
    ÙĪØ³ÛĮ
    -0.13
     already
    -0.13
    igger
    -0.13
    POSITIVE LOGITS
    HIR
    0.17
    缼
    0.17
    brains
    0.16
     only
    0.15
    hey
    0.15
    Broken
    0.15
    égorie
    0.15
     interested
    0.14
     really
    0.14
     toler
    0.14
    Act Density 0.062%

    No Known Activations