INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.83
    -0.82
    issaient
    -0.77
     jelent
    -0.76
    СТЬ
    -0.76
    лик
    -0.76
    nické
    -0.75
     drank
    -0.74
    логические
    -0.73
     vanguardia
    -0.73
    POSITIVE LOGITS
     CAF
    0.82
     اختلاف
    0.76
    Ancestor
    0.73
    0.72
     earlier
    0.72
    essentially
    0.72
    在于
    0.72
     APM
    0.71
     meistens
    0.71
     bound
    0.69
    Act Density 0.019%

    No Known Activations