INDEX
    Explanations

    phrases related to demonstrating, showing, or highlighting something to others

    New Auto-Interp
    Negative Logits
    <bos>
    -3.13
    
    
    -0.91
    -0.91
    <?
    
    -0.80
    /***
    
    -0.72
    /**
    -0.71
    <?
    -0.70
    //{
    
    -0.63
    updateUI
    -0.62
    /*
    -0.61
    POSITIVE LOGITS
    Shows
    1.13
     Showing
    1.12
     Shows
    1.10
     SHOWS
    1.09
     thut
    1.08
     bandung
    1.05
    showing
    1.05
     SHOW
    1.04
    shows
    1.03
     embodi
    1.02
    Act Density 0.244%

    No Known Activations