INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    well
    0.45
    φαν
    0.45
    она
    0.44
    utani
    0.44
    alone
    0.43
    sit
    0.43
    ou
    0.42
     Elat
    0.42
    a
    0.42
    сные
    0.41
    POSITIVE LOGITS
     عمومی
    0.47
    0.45
    0.43
     Scotland
    0.43
     باقی
    0.43
    0.42
    قية
    0.42
     seduce
    0.41
     Boxing
    0.41
     유럽
    0.40
    Act Density 0.004%

    No Known Activations