INDEX
    Explanations

    phrases related to capabilities or abilities

    New Auto-Interp
    Negative Logits
    <bos>
    -3.26
    /***
    
    -0.84
    /**
    -0.83
     intersper
    -0.80
    -0.79
    
    
    -0.79
    <?
    -0.76
     harmonize
    -0.67
     endow
    -0.64
     banish
    -0.62
    POSITIVE LOGITS
     ananas
    1.00
     thuy
    0.99
     kafe
    0.98
     saar
    0.98
     cannes
    0.97
     kasa
    0.95
     seksi
    0.95
     bandung
    0.93
     maroc
    0.93
     jawa
    0.92
    Act Density 0.105%

    No Known Activations