INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sentir
    0.90
     conserver
    0.88
     informée
    0.84
     konserv
    0.82
     conserva
    0.80
     potenz
    0.80
     어렵
    0.79
     específico
    0.78
    <unused559>
    0.77
     stratég
    0.77
    POSITIVE LOGITS
    9
    0.93
    8
    0.91
    6
    0.91
    0
    0.91
    3
    0.91
    7
    0.89
    5
    0.87
    II
    0.86
    </h1>
    0.85
    </h3>
    0.84
    Act Density 0.540%

    No Known Activations