INDEX
    Explanations

    references to personal relationships and family connections

    New Auto-Interp
    Negative Logits
    asper
    -0.16
    åĩĿ
    -0.16
    ucken
    -0.16
    Unnamed
    -0.15
    IDGET
    -0.15
    ÅĻes
    -0.15
    imizer
    -0.14
    .prompt
    -0.14
    ediator
    -0.14
    ıcı
    -0.14
    POSITIVE LOGITS
     dangerous
    0.17
     danger
    0.17
     jeopard
    0.16
     secrets
    0.16
     dangerously
    0.15
     both
    0.15
     worse
    0.15
     deeper
    0.15
     deep
    0.15
     darkest
    0.15
    Act Density 0.374%

    No Known Activations