INDEX
    Explanations

    numerical values or dates

    New Auto-Interp
    Negative Logits
    nge
    -0.15
    omin
    -0.15
    ALER
    -0.15
    aby
    -0.14
    ilt
    -0.14
    och
    -0.14
    abs
    -0.14
    åĴ²
    -0.14
    ück
    -0.14
    äng
    -0.14
    POSITIVE LOGITS
    ig
    0.21
    -ra
    0.20
     ele
    0.19
    -b
    0.19
     aug
    0.18
    .fe
    0.18
     tav
    0.18
     ut
    0.18
    -es
    0.17
     ok
    0.17
    Act Density 0.001%

    No Known Activations