INDEX
    Explanations

    comparisons and analogies

    analogies and comparisons

    New Auto-Interp
    Negative Logits
    FX
    -0.76
    alla
    -0.73
    amily
    -0.72
    amo
    -0.67
    formance
    -0.67
    xx
    -0.65
    etheless
    -0.64
    Lua
    -0.63
    amina
    -0.62
    ij士
    -0.62
    POSITIVE LOGITS
     homework
    0.73
     apple
    0.72
     aspirin
    0.72
     iPod
    0.68
     dise
    0.65
     Xer
    0.64
     Moz
    0.63
     puzzle
    0.62
     french
    0.62
     weights
    0.62
    Act Density 0.615%

    No Known Activations