INDEX
    Explanations

    references to werewolves or related terms

    New Auto-Interp
    Negative Logits
    o
    -0.20
    son
    -0.18
    a
    -0.17
    ing
    -0.17
    tons
    -0.17
    sb
    -0.15
    e
    -0.15
    ORG
    -0.15
    sg
    -0.15
    verte
    -0.15
    POSITIVE LOGITS
    ewolf
    0.29
    ghi
    0.18
    wolf
    0.17
    kiye
    0.16
    ediator
    0.16
    ìķ½
    0.16
    eturn
    0.15
    ewith
    0.15
    kenin
    0.15
    ktop
    0.15
    Act Density 0.012%

    No Known Activations