INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    r
    -0.21
    R
    -0.16
    rifice
    -0.12
     R
    -0.11
     r
    -0.11
    rían
    -0.10
    ridden
    -0.10
    ría
    -0.10
    _r
    -0.10
     Researchers
    -0.09
    POSITIVE LOGITS
    (rc
    0.27
    (rt
    0.26
    (r
    0.25
    (rs
    0.25
    (rd
    0.24
    (rr
    0.24
    (rv
    0.23
    (rb
    0.23
    (rx
    0.21
    (ro
    0.20
    Act Density 0.071%

    No Known Activations