INDEX
    Explanations

    numerical values formatted as text in a structured presentation, such as references or lists in a document

    New Auto-Interp
    Negative Logits
     millenn
    -0.86
    anooga
    -0.78
    querque
    -0.73
    ionics
    -0.68
     contests
    -0.67
     passionate
    -0.66
     masc
    -0.66
     tragedies
    -0.66
     histories
    -0.65
     mutual
    -0.65
    POSITIVE LOGITS
    806
    1.17
    608
    1.14
    708
    1.13
    504
    1.13
    641
    1.13
    758
    1.12
    70
    1.11
    756
    1.11
    807
    1.11
    688
    1.10
    Act Density 0.378%

    No Known Activations