INDEX
    Explanations

    hex color codes preceded by a '#'

    New Auto-Interp
    Negative Logits
    g
    -0.78
    H
    -0.76
    G
    -0.73
    w
    -0.72
    j
    -0.72
    r
    -0.71
    M
    -0.70
    X
    -0.70
    W
    -0.68
    V
    -0.68
    POSITIVE LOGITS
     Monfieur
    0.64
     houſe
    0.59
    ViewFeatures
    0.58
     poffe
    0.57
     cauſe
    0.55
     himſelf
    0.54
    daad
    0.54
     chofe
    0.54
    CEED
    0.52
    pośred
    0.52
    Act Density 1.719%

    No Known Activations