INDEX
    Explanations

    neuron threshold condition

    New Auto-Interp
    Negative Logits
    aków
    0.43
    0.36
     Vj
    0.36
    पाठ
    0.35
    TabIndex
    0.35
     subcut
    0.35
    ULATION
    0.35
    0.35
    uania
    0.34
    kingdom
    0.34
    POSITIVE LOGITS
     Jimmy
    0.45
    Jimmy
    0.42
     dope
    0.40
     swing
    0.40
     Bann
    0.40
     leg
    0.39
     arm
    0.38
     bann
    0.38
     hook
    0.38
     loin
    0.38
    Act Density 0.001%

    No Known Activations