INDEX
    Explanations

    expressions of consistency and reliability in various contexts

    New Auto-Interp
    Negative Logits
    rice
    -0.16
    ough
    -0.16
    oise
    -0.15
    кав
    -0.15
    zen
    -0.15
    icina
    -0.15
    scribe
    -0.14
    aver
    -0.14
    oni
    -0.14
    lum
    -0.14
    POSITIVE LOGITS
    ently
    0.19
    ably
    0.17
    aye
    0.16
    bred
    0.16
     inconsistent
    0.16
    ively
    0.16
    antly
    0.15
    ly
    0.15
     across
    0.15
     throughout
    0.14
    Act Density 0.030%

    No Known Activations