INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    -0.96
    ↵↵↵
    -0.39
    <eos>
    -0.39
    			
    -0.35
    ↵↵↵↵
    -0.34
    		
    -0.34
    ↵↵↵↵↵
    -0.32
    ↵↵↵↵↵↵
    -0.32
               
    -0.32
                
    -0.32
    POSITIVE LOGITS
     myſelf
    1.18
     himſelf
    1.01
     itſelf
    0.98
     purpoſe
    0.98
     pleaſure
    0.95
    NUMX
    0.94
     themſelves
    0.94
     whoſe
    0.92
    )";
    
    0.91
     Efq
    0.91
    Act Density 0.000%

    No Known Activations