INDEX
    Explanations

    references to specific authors and their work in academic papers

    Initials followed by a period or comma

    authors' initials and names

    New Auto-Interp
    Negative Logits
    ]`
    -1.00
    ^(@)
    -0.98
     autorytatywna
    -0.96
    $")
    -0.95
     }}$}
    -0.94
    .")]
    -0.94
    >\<^
    -0.94
    %");
    -0.93
    %")
    -0.93
    "]}
    -0.92
    POSITIVE LOGITS
     stesso
    0.68
     himself
    0.64
    ,
    0.63
    .
    0.53
     stessa
    0.53
    .,
    0.51
     Himself
    0.50
     stessi
    0.50
     (
    0.45
    '
    0.44
    Act Density 0.205%

    No Known Activations