INDEX
    Explanations

    phrases relating to the nature and consequences of objects, particularly dangerous ones like guns, and societal issues surrounding them

    New Auto-Interp
    Negative Logits
    jvu
    -0.16
    ÄŁer
    -0.15
    orners
    -0.14
    oku
    -0.14
    ementia
    -0.14
    ÑĪив
    -0.14
    afort
    -0.14
    ̧
    -0.14
    estr
    -0.14
    mazon
    -0.14
    POSITIVE LOGITS
     alarm
    0.29
     disappoint
    0.28
     shock
    0.27
     refresh
    0.25
     delight
    0.25
    alarm
    0.24
     distress
    0.24
     disarm
    0.24
     impress
    0.24
     anything
    0.23
    Act Density 0.473%

    No Known Activations