INDEX
    Explanations

    phrases that involve comparisons or equivalences

    New Auto-Interp
    Negative Logits
    )";
    
    -0.72
     Wikispecies
    -0.69
    )");
    
    -0.66
     Falla
    -0.64
     RSSSF
    -0.63
    íslu
    -0.62
    ();)
    -0.61
     """
    
    -0.61
     NSCoder
    -0.60
    ::::::::::::::::
    -0.60
    POSITIVE LOGITS
     like
    0.92
     Like
    0.89
     LIKE
    0.83
    Like
    0.78
     Unlike
    0.70
    like
    0.69
    LIKE
    0.66
     imitate
    0.64
    Unlike
    0.64
     seperti
    0.63
    Act Density 0.145%

    No Known Activations