INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     noDo
    -0.60
     <>",
    -0.58
    DockStyle
    -0.53
    arons
    -0.51
    salad
    -0.51
     Liquor
    -0.50
    IntoConstraints
    -0.50
     babys
    -0.50
    Blon
    -0.50
    Slf
    -0.50
    POSITIVE LOGITS
     widespread
    1.79
    idespread
    1.70
     widely
    1.02
    wide
    0.92
     wide
    0.91
     Widely
    0.90
     extensive
    0.82
    Extensive
    0.81
    Wide
    0.81
    广泛
    0.79
    Act Density 0.003%

    No Known Activations