INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.64
    Portale
    -0.56
    SBATCH
    -0.53
     colspan
    -0.52
    
    -0.50
     cardinality
    -0.49
     Solitaire
    -0.48
    SizeMode
    -0.48
    脚注の使い方
    -0.47
    izr
    -0.47
    POSITIVE LOGITS
    fed
    0.59
     fed
    0.57
     censiti
    0.54
    redge
    0.53
     mostrarán
    0.53
    ridge
    0.53
    eway
    0.52
    ظمة
    0.51
    isma
    0.50
    omitempty
    0.48
    Act Density 0.000%

    No Known Activations