INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.72
     suppresses
    0.70
     orthogonality
    0.68
     puppy
    0.67
     plumbers
    0.66
     actuation
    0.65
    0.64
     attendant
    0.64
    TouchEvent
    0.64
     développe
    0.63
    POSITIVE LOGITS
     وفي
    0.92
    IN
    0.91
    0.90
    0.89
    0.84
    ٬
    0.78
    D
    0.77
    0.77
    0.75
     savor
    0.74
    Act Density 0.004%

    No Known Activations