INDEX
    Explanations

    special characters or punctuation marks in the text

    New Auto-Interp
    Negative Logits
    ment
    -0.70
    ous
    -0.63
    ше
    -0.63
    isson
    -0.63
    ness
    -0.62
    an
    -0.62
    ligen
    -0.61
     Montal
    -0.59
    ism
    -0.59
     McCar
    -0.58
    POSITIVE LOGITS
    "}
    1.77
    '}
    1.68
    "]}
    1.64
    ']}
    1.62
    ]")]
    1.56
    ]}
    1.55
    )}
    1.55
    ")}
    1.52
     "}
    1.50
    })}
    1.47
    Act Density 0.346%

    No Known Activations