INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Plate
    -0.81
     comet
    -0.79
    хьтан
    -0.79
     propOrder
    -0.78
     plate
    -0.75
     Comet
    -0.74
    spotify
    -0.72
    saraba
    -0.71
    Plate
    -0.69
     FRAME
    -0.68
    POSITIVE LOGITS
    word
    0.50
    post
    0.47
    war
    0.46
    iness
    0.43
    setcounter
    0.43
     proceeds
    0.42
    let
    0.42
    load
    0.42
    /
    0.41
    times
    0.41
    Act Density 0.251%

    No Known Activations