INDEX
    Explanations

    frequent occurrences of the word "the"

    Tokens preceding nouns or titles

    the followed by specific nouns

    New Auto-Interp
    Negative Logits
     sanitaires
    -0.68
     sauvages
    -0.67
     mukaan
    -0.66
     löytyy
    -0.65
     čierna
    -0.65
     sienta
    -0.64
     aquilo
    -0.63
     braccia
    -0.63
     mellett
    -0.62
     esetén
    -0.60
    POSITIVE LOGITS
     same
    0.96
    ")));
    
    0.94
    "]}
    0.89
    ']}
    0.89
     latter
    0.89
    )];
    
    0.88
    ".
    
    0.86
     entire
    0.86
     following
    0.85
    "]
    
    0.81
    Act Density 0.153%

    No Known Activations