INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Pollo
    -0.43
     prácti
    -0.42
    Pfe
    -0.41
    efficiency
    -0.40
    fine
    -0.40
     Ojo
    -0.39
    colli
    -0.38
    Ctl
    -0.38
    Ili
    -0.37
    useful
    -0.37
    POSITIVE LOGITS
    nam
    2.72
     nam
    2.09
    NAM
    2.06
     NAM
    1.92
    Nam
    1.91
     Nam
    1.90
     Anam
    1.36
    namn
    1.34
    naam
    1.23
    ナム
    1.22
    Act Density 0.008%

    No Known Activations