INDEX
    Explanations

    words related to testing, trying out, or exploring different options and ideas

    terms related to experimentation and testing

    New Auto-Interp
    Negative Logits
     Cheong
    -0.75
    mary
    -0.68
    CLOSE
    -0.66
    games
    -0.66
    olulu
    -0.65
    ens
    -0.64
    Calling
    -0.64
    die
    -0.64
    si
    -0.63
    BN
    -0.62
    POSITIVE LOGITS
     experimentation
    1.26
     experimenting
    1.21
     experimented
    1.13
     experiments
    0.94
     tink
    0.88
    imental
    0.88
     withd
    0.86
     experiment
    0.86
    aults
    0.85
     Experiment
    0.81
    Act Density 0.007%

    No Known Activations