INDEX
    Explanations

    mentions of numeric values, such as quantities, counts, or amounts in text or code.

    New Auto-Interp
    Negative Logits
     dajj
    0.46
    ைகளை
    0.43
     бк
    0.42
    <unused1053>
    0.42
     педагоги
    0.41
    0.40
     губер
    0.40
    0.39
    <unused980>
    0.39
     отправ
    0.39
    POSITIVE LOGITS
     despite
    0.48
     and
    0.48
    and
    0.46
     
    0.46
     A
    0.44
    u
    0.44
     very
    0.44
     extremely
    0.43
    left
    0.43
    long
    0.42
    Act Density 0.348%

    No Known Activations