INDEX
    Explanations

    references to numerical values, particularly the word "ten"

    New Auto-Interp
    Negative Logits
    time
    -0.20
    side
    -0.20
    tica
    -0.18
    WithOptions
    -0.17
    tiler
    -0.17
    tega
    -0.17
    tight
    -0.17
    athers
    -0.17
    ockets
    -0.16
    tempts
    -0.16
    POSITIVE LOGITS
    acious
    0.35
    ancy
    0.33
    acity
    0.32
    ured
    0.31
    ement
    0.30
    ets
    0.29
    anted
    0.28
    ancies
    0.28
    ure
    0.26
    ements
    0.26
    Act Density 0.009%

    No Known Activations