INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     formally
    0.95
     форма
    0.91
     formal
    0.82
     Formal
    0.77
    Formal
    0.77
    form
    0.76
    formal
    0.75
     form
    0.71
     Form
    0.70
    Form
    0.67
    POSITIVE LOGITS
     before
    0.52
    before
    0.50
     sebelum
    0.50
     trước
    0.47
     antes
    0.46
     prije
    0.46
     قبل
    0.45
     voordat
    0.44
     innan
    0.43
     ennen
    0.42
    Act Density 0.001%

    No Known Activations