INDEX
    Explanations

    terms related to regulations and definitions within a formal plan

    New Auto-Interp
    Negative Logits
    thora
    -0.17
    lero
    -0.16
    blo
    -0.16
    òa
    -0.16
    eker
    -0.15
    bert
    -0.15
    برÛĮ
    -0.15
    addir
    -0.15
    ä¸įäºĨ
    -0.15
    ayd
    -0.15
    POSITIVE LOGITS
     means
    0.27
    means
    0.26
     mean
    0.23
    _means
    0.23
    mean
    0.20
     Means
    0.20
     Mean
    0.19
    Means
    0.18
    .mean
    0.17
    Mean
    0.17
    Act Density 0.018%

    No Known Activations