INDEX
    Explanations

    Code and technical language

    New Auto-Interp
    Negative Logits
    Driven
    -0.63
     Driven
    -0.60
     fizer
    -0.60
    Written
    -0.55
    driven
    -0.55
     localize
    -0.54
    ventil
    -0.54
     learns
    -0.53
     sogni
    -0.53
     haft
    -0.52
    POSITIVE LOGITS
     attempted
    0.83
     kept
    0.82
     stepped
    0.82
     opened
    0.81
     returned
    0.81
     repaired
    0.79
     made
    0.79
     picked
    0.78
     clicked
    0.78
     passed
    0.77
    Act Density 0.012%

    No Known Activations