INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    worldly
    -0.65
     Jinn
    -0.60
    frame
    -0.60
     resemblance
    -0.59
    frames
    -0.57
     hub
    -0.57
     Anarchy
    -0.57
     opp
    -0.56
     Revel
    -0.56
    chin
    -0.55
    POSITIVE LOGITS
    000
    1.79
    00
    1.04
    0000
    0.98
    001
    0.96
     000
    0.88
    0002
    0.87
    00000000
    0.84
    00000
    0.84
    0001
    0.80
    600
    0.80
    Act Density 0.103%

    No Known Activations