INDEX
    Explanations

    proper nouns, particularly names and places

    New Auto-Interp
    Negative Logits
    urt
    -0.15
    icans
    -0.15
    è¾ij
    -0.15
    (class
    -0.14
     Matth
    -0.14
    ism
    -0.14
    terior
    -0.14
    atern
    -0.13
    ÃŃ
    -0.13
    urgeon
    -0.13
    POSITIVE LOGITS
    urette
    0.16
    šov
    0.16
    intr
    0.16
    ÑĢоÑĪ
    0.15
    äge
    0.15
    shore
    0.15
    ÑĢож
    0.14
    aget
    0.14
    Ñĥв
    0.14
    APA
    0.13
    Act Density 0.033%

    No Known Activations