INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     volte
    -0.08
    Excluded
    -0.07
     Funk
    -0.07
     Northwestern
    -0.07
     Foster
    -0.07
     PCP
    -0.07
    voi
    -0.07
     cement
    -0.07
    ledning
    -0.07
     Pennsylvania
    -0.07
    POSITIVE LOGITS
     atent
    0.12
    /watch
    0.10
     listening
    0.09
    0.09
     attent
    0.09
    hin
    0.09
    ค์
    0.09
     внимательно
    0.09
     kepada
    0.08
     louder
    0.08
    Act Density 0.020%

    No Known Activations