INDEX
    Explanations

    numerical values and quantifiers

    New Auto-Interp
    Negative Logits
    avez
    -0.15
     rog
    -0.14
    ossip
    -0.14
    ven
    -0.14
    arger
    -0.14
     waves
    -0.14
     Waves
    -0.14
     rein
    -0.14
     finish
    -0.14
    lox
    -0.13
    POSITIVE LOGITS
    ural
    0.16
    pyx
    0.15
    zin
    0.15
    _BEGIN
    0.15
    اÙĪØ±ÛĮ
    0.14
    Assigned
    0.14
     gates
    0.14
    ivec
    0.13
    ساÙĨÛĮ
    0.13
    oly
    0.13
    Act Density 0.003%

    No Known Activations