INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .Aggressive
    -0.08
     dishes
    -0.07
     Xia
    -0.07
     Václav
    -0.07
     recordings
    -0.06
    ataka
    -0.06
     rounded
    -0.06
    cole
    -0.06
    .tick
    -0.06
     academy
    -0.06
    POSITIVE LOGITS
     Alt
    0.08
     alt
    0.08
    _ALT
    0.07
    '];?>↵
    0.06
    ;s
    0.06
    	verify
    0.06
    Alt
    0.06
    {\
    0.06
    [Y
    0.06
    ź
    0.06
    Act Density 0.003%

    No Known Activations