INDEX
    Explanations

    phrases related to different types of fences

    words related to legal or ethical breaches

    New Auto-Interp
    Negative Logits
     artif
    -0.88
     vulner
    -0.81
     reflex
    -0.73
     bun
    -0.68
     sugg
    -0.67
     Seym
    -0.67
     Eston
    -0.66
     misunder
    -0.66
     metic
    -0.65
     Assy
    -0.65
    POSITIVE LOGITS
    cffffcc
    1.17
    ï¸ı
    1.01
    âĶĢâĶĢ
    0.99
    mad
    0.94
    talk
    0.93
    \-
    0.90
    clear
    0.89
    null
    0.88
    sure
    0.88
    closure
    0.88
    Act Density 0.136%

    No Known Activations