INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ausgel
    0.40
    0.38
    avant
    0.38
    0.38
    enhanced
    0.37
    💥
    0.37
    ຖື
    0.37
    ÉE
    0.37
    ോദ
    0.36
     shrinkage
    0.36
    POSITIVE LOGITS
     open
    1.14
     Open
    1.08
     abiertos
    1.05
     terbuka
    0.96
     OPEN
    0.94
     openness
    0.93
    Open
    0.93
     abierta
    0.93
     opens
    0.91
     abierto
    0.91
    Act Density 0.077%

    No Known Activations