INDEX
    Explanations

    references to fear and danger

    New Auto-Interp
    Negative Logits
     twink
    -0.16
    åde
    -0.15
     natural
    -0.14
    ifi
    -0.14
     (
    -0.14
     Lyons
    -0.13
    822
    -0.13
     naturally
    -0.13
     Burst
    -0.13
     recommended
    -0.13
    POSITIVE LOGITS
    à¥įरण
    0.15
    ä¼¼çļĦ
    0.14
    LBL
    0.14
     è¡ĮæĶ¿
    0.14
    uncate
    0.14
    HIR
    0.14
    afen
    0.14
    intent
    0.14
    кап
    0.14
     INLINE
    0.13
    Act Density 0.606%

    No Known Activations