INDEX
    Explanations

    references to historical demographics and categories of people or places

    births and nationalities

    New Auto-Interp
    Negative Logits
    <eos>
    -0.59
    ↵↵
    -0.57
      
    -0.53
       
    -0.50
     […]
    -0.49
           
    -0.47
        
    -0.47
    <strong>
    -0.47
         
    -0.45
          
    -0.42
    POSITIVE LOGITS
     queſta
    1.23
    ſchaft
    1.10
    <unused52>
    1.09
    <unused8>
    1.09
    <unused41>
    1.09
    <unused23>
    1.08
    <unused28>
    1.08
    <unused16>
    1.08
    [@BOS@]
    1.08
    <unused14>
    1.08
    Act Density 0.020%

    No Known Activations