INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ções
    0.46
     categor
    0.45
    0.44
     in
    0.43
     ي
    0.42
     métodos
    0.41
    实在
    0.41
     lavish
    0.41
    \
    0.41
    ът
    0.40
    POSITIVE LOGITS
    the
    0.68
    g
    0.66
    de
    0.63
    m
    0.54
    p
    0.53
    to
    0.51
    n
    0.51
    v
    0.48
    h
    0.47
    al
    0.47
    Act Density 0.002%

    No Known Activations