INDEX
    Explanations

    phrases indicating thought and reflection on past actions

    New Auto-Interp
    Negative Logits
    <bos>
    -2.00
    /**
    -1.06
    -1.02
     effectually
    -0.90
    
    
    -0.88
    <?
    
    -0.88
     forbear
    -0.86
     quitted
    -0.85
     gratify
    -0.82
    <?
    -0.82
    POSITIVE LOGITS
     vasi
    0.89
     tyn
    0.87
     asfal
    0.85
     ananas
    0.84
     ortop
    0.84
     alpes
    0.83
     marte
    0.82
     torba
    0.81
     Ferdin
    0.78
     antropo
    0.77
    Act Density 0.913%

    No Known Activations