INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     elements
    -0.07
    (sum
    -0.07
     foundations
    -0.07
     desk
    -0.06
     Dodd
    -0.06
    Capabilities
    -0.06
    .Make
    -0.06
     Мик
    -0.06
     سنگ
    -0.06
     crafts
    -0.06
    POSITIVE LOGITS
    ilip
    0.07
    ucc
    0.07
     εμφ
    0.06
    0.06
    /.↵↵
    0.06
     краще
    0.06
    ним
    0.06
     Stafford
    0.06
    ..↵↵
    0.06
    ковий
    0.06
    Act Density 0.006%

    No Known Activations