INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     CTRL
    -0.07
     aun
    -0.07
    nas
    -0.06
    vinc
    -0.06
    ("(
    -0.06
    ,std
    -0.06
     cyclists
    -0.06
    svm
    -0.06
     Zhang
    -0.06
    	v
    -0.06
    POSITIVE LOGITS
    ot
    0.11
    Pot
    0.10
     pot
    0.09
    OT
    0.09
     Potter
    0.08
     Pot
    0.08
    cott
    0.08
     potent
    0.08
    OTT
    0.08
     OT
    0.07
    Act Density 0.071%

    No Known Activations