INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     vannak
    -0.64
    polate
    -0.56
     happen
    -0.55
     voltak
    -0.55
     deny
    -0.55
    MessageTagHelper
    -0.54
    happen
    -0.54
     bave
    -0.54
     tuta
    -0.53
    tings
    -0.52
    POSITIVE LOGITS
     is
    0.79
     works
    0.74
     acts
    0.71
    0.71
     all
    0.69
     burns
    0.68
     functions
    0.66
     looks
    0.65
     gets
    0.65
     cares
    0.64
    Act Density 0.010%

    No Known Activations