INDEX
    Explanations

    phrases that highlight exceptions or specific instances within a broader context

    New Auto-Interp
    Negative Logits
    instr
    -0.07
    ën
    -0.07
    inz
    -0.07
    ocha
    -0.07
    кÑĢа
    -0.06
    574
    -0.06
    inas
    -0.06
    лÑĸд
    -0.06
    amac
    -0.06
    -IN
    -0.06
    POSITIVE LOGITS
     case
    0.12
     in
    0.12
     neste
    0.10
     caso
    0.10
     cases
    0.10
    åł´åIJĪãģ¯
    0.09
     here
    0.09
     Case
    0.09
    case
    0.09
    à¸ģรà¸ĵ
    0.09
    Act Density 0.115%

    No Known Activations