INDEX
    Explanations

    variations of the word "use" in different contexts

    New Auto-Interp
    Negative Logits
    layers
    -0.16
    ét
    -0.15
    oux
    -0.15
    ayer
    -0.15
    acie
    -0.14
     xin
    -0.14
     Kinder
    -0.14
    852
    -0.13
    adera
    -0.13
    ayers
    -0.13
    POSITIVE LOGITS
    hic
    0.15
     Shoulder
    0.14
    erde
    0.14
    FE
    0.14
    mouseup
    0.13
    frames
    0.13
    mir
    0.13
    маÑĤ
    0.13
    .Formatting
    0.13
    ĻĤ
    0.13
    Act Density 0.046%

    No Known Activations