INDEX
    Explanations

    the word "almost" with activation values of 9 or 10

    New Auto-Interp
    Negative Logits
    oran
    -1.21
    agate
    -1.10
    oris
    -1.08
    erion
    -1.04
    eria
    -1.03
    alam
    -0.99
     è£ıè¦ļéĨĴ
    -0.97
    osis
    -0.97
    oland
    -0.94
    achus
    -0.94
    POSITIVE LOGITS
    stress
    1.10
     mundane
    1.07
     certainly
    1.04
    zero
    0.95
     identical
    0.94
    rito
    0.91
     exclusively
    0.90
     instinct
    0.88
    arser
    0.88
    lex
    0.87
    Act Density 0.498%

    No Known Activations