INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.10
     continuity
    -0.08
    continu
    -0.08
     continuation
    -0.07
    \\/
    -0.07
     전달
    -0.07
     지속
    -0.07
     affordability
    -0.07
    -0.07
     sympath
    -0.07
    POSITIVE LOGITS
     fresh
    0.10
    -clean
    0.09
     temiz
    0.09
     fresca
    0.09
     limpa
    0.09
     limpio
    0.08
     limpia
    0.08
     freshly
    0.08
    Clean
    0.08
     فار
    0.08
    Act Density 0.006%

    No Known Activations