INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     juggling
    -0.68
     hob
    -0.62
    arantine
    -0.61
    JP
    -0.60
     tray
    -0.58
    prime
    -0.57
    ļéĨĴ
    -0.57
     disp
    -0.57
    nces
    -0.57
    jp
    -0.57
    POSITIVE LOGITS
     29
    1.11
     27
    1.06
     28
    1.05
     26
    1.03
     31
    1.02
    flower
    1.00
     23
    0.98
    fair
    0.96
     19
    0.95
     22
    0.94
    Act Density 0.247%

    No Known Activations