INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Speed
    -0.90
     Speed
    -0.81
     speed
    -0.79
    SPEED
    -0.64
     SPEED
    -0.63
    speed
    -0.59
    peed
    -0.54
     speeds
    -0.50
     speeding
    -0.49
     speedy
    -0.47
    POSITIVE LOGITS
     يتيمه
    0.99
     myſelf
    0.96
     itſelf
    0.95
     pleaſure
    0.87
     ―――――
    0.85
     purpoſe
    0.84
     tfsi
    0.84
     Eſ
    0.84
     raiſ
    0.83
     unſ
    0.82
    Act Density 0.029%

    No Known Activations