INDEX
    Explanations

    phrases related to consistency and comparison

    instances of the word "the" and its variations, indicating a focus on definite articles or references

    New Auto-Interp
    Negative Logits
     caches
    -0.67
    vernment
    -0.63
    Versions
    -0.60
    DB
    -0.60
     briefly
    -0.60
    ells
    -0.59
    each
    -0.59
     periodically
    -0.58
    mares
    -0.58
    etheus
    -0.58
    POSITIVE LOGITS
     same
    1.45
    same
    1.27
     Same
    1.05
     easiest
    0.99
     result
    0.96
    ologically
    0.96
     simplest
    0.95
     opposite
    0.95
     smallest
    0.94
     utmost
    0.94
    Act Density 0.230%

    No Known Activations