INDEX
    Explanations

    proper nouns, particularly names of people

    New Auto-Interp
    Negative Logits
     Houſe
    -0.87
     Perſ
    -0.86
     Conſ
    -0.84
     Diſ
    -0.82
     Reſ
    -0.79
     Shaksp
    -0.79
     houſe
    -0.78
     pleaſure
    -0.74
     Theſe
    -0.73
     greateſt
    -0.73
    POSITIVE LOGITS
    ")));
    
    1.05
    ]));
    
    0.97
    '])){
    
    0.97
    ')));
    0.95
    Искәрмәләр
    0.91
    '));
    
    0.89
    /$',
    0.87
    "];
    
    0.84
    }';
    0.84
    '))
    
    0.83
    Act Density 0.384%

    No Known Activations