INDEX
    Explanations

    instances of specific number words or numerical references

    New Auto-Interp
    Negative Logits
     záv
    -0.15
    립
    -0.15
    ilogy
    -0.14
    δÏĮ
    -0.14
    ardash
    -0.14
    kers
    -0.14
    elp
    -0.14
    ker
    -0.14
    ASM
    -0.14
    ắp
    -0.14
    POSITIVE LOGITS
    394
    0.15
    368
    0.15
     Lamp
    0.14
    hower
    0.14
    ronic
    0.14
    PUTE
    0.14
    trash
    0.14
    valuator
    0.14
    364
    0.14
    396
    0.13
    Act Density 0.027%

    No Known Activations