INDEX
    Explanations

    words indicating relationships or commonalities between entities or actions

    New Auto-Interp
    Negative Logits
     "
    -1.10
     “
    -0.88
    '],'
    -0.78
     '
    -0.69
     mena
    -0.67
    -0.65
    dina
    -0.63
    taler
    -0.63
    ']").
    -0.63
    alapa
    -0.63
    POSITIVE LOGITS
     Efq
    1.12
     itſelf
    1.01
     Cæsar
    1.00
    quele
    0.95
    soever
    0.92
    ostante
    0.92
    withstanding
    0.90
     auffi
    0.90
     myſelf
    0.89
    Rptr
    0.88
    Act Density 0.162%

    No Known Activations