INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.81
    Partici
    -0.77
     gedachten
    -0.76
    Passcode
    -0.74
    gangss
    -0.73
    signUp
    -0.73
    illingham
    -0.71
    -0.70
     italienischen
    -0.70
    dienste
    -0.69
    POSITIVE LOGITS
     left
    4.59
    Left
    3.77
     Left
    3.70
    3.44
    left
    3.33
    LEFT
    2.91
     LEFT
    2.83
     right
    2.69
     左
    2.58
    2.50
    Act Density 0.055%

    No Known Activations