INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    the
    1.84
    in
    1.34
    el
    1.31
    to
    1.31
    ol
    1.27
    ні
    1.25
    v
    1.24
     S
    1.23
    n
    1.20
    y
    1.20
    POSITIVE LOGITS
    ב
    1.54
     probes
    1.44
     probe
    1.33
    其他
    1.30
     probed
    1.30
     probing
    1.28
     benefícios
    1.26
     legumes
    1.24
     pię
    1.22
     neumáticos
    1.21
    Act Density 0.004%

    No Known Activations