INDEX
    Explanations

    proper nouns and references to specific locations or historical figures

    New Auto-Interp
    Negative Logits
    "):
    
    -1.14
    '):
    
    -1.11
    ")));
    
    -1.05
    "){
    
    -1.04
    "]);
    
    -1.02
    ")){
    
    -1.01
    $")
    -1.01
    )");
    
    -1.00
    '){
    
    -0.99
    []
    
    -0.98
    POSITIVE LOGITS
    ,
    2.80
    (),
    1.17
     ,
    1.17
    ،
    1.14
    1.13
    !,
    1.04
    $,
    1.04
    ,
    
    1.00
    .,
    1.00
    ?,
    0.99
    Act Density 9.892%

    No Known Activations