INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    maintained
    1.04
    expertise
    1.04
    рів
    1.00
     maintain
    1.00
     altos
    1.00
    اوم
    0.98
     vistos
    0.95
     Aufl
    0.95
     especific
    0.94
     الطبي
    0.93
    POSITIVE LOGITS
    💭
    1.70
    provoking
    1.56
     aloud
    1.49
     provoking
    1.29
    1.24
     przewod
    1.21
    办法
    1.19
     THINK
    1.17
     thoughts
    1.15
    1.15
    Act Density 0.224%

    No Known Activations