INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    flake
    -0.07
    prak
    -0.07
    .cloud
    -0.07
     knife
    -0.07
    -parse
    -0.06
    .simple
    -0.06
    ithmetic
    -0.06
    	children
    -0.06
    -Smith
    -0.06
     ск
    -0.06
    POSITIVE LOGITS
     Tour
    0.21
     tour
    0.20
     TOUR
    0.17
    Tour
    0.16
     tours
    0.14
     toured
    0.13
    tour
    0.13
     touring
    0.12
    our
    0.12
     Tours
    0.11
    Act Density 0.008%

    No Known Activations