INDEX
    Explanations

    titles or references to authoritative figures

    references to authority figures and their roles, particularly those related to the term "lord" and "woman"

    New Auto-Interp
    Negative Logits
    ¥ŀ
    -0.86
    unes
    -0.66
     skelet
    -0.65
     Citiz
    -0.65
     Palestin
    -0.64
     widest
    -0.61
    itialized
    -0.60
     lightweight
    -0.60
     insulation
    -0.59
    ogi
    -0.59
    POSITIVE LOGITS
    lord
    0.97
    hood
    0.85
    lords
    0.81
    hyde
    0.80
    pool
    0.76
    der
    0.76
    ëĭ
    0.75
    hattan
    0.75
    ifest
    0.74
    ipop
    0.74
    Act Density 0.022%

    No Known Activations