INDEX
    Explanations

    universities and colleges

    New Auto-Interp
    Negative Logits
     UCLA
    0.60
    Stanford
    0.60
    stanford
    0.60
     Stanford
    0.58
     Harvard
    0.53
     Yale
    0.52
    Harvard
    0.50
    Yale
    0.46
    berkeley
    0.46
    taobao
    0.45
    POSITIVE LOGITS
     Liberal
    0.60
     Dominican
    0.59
     liberal
    0.55
     Biology
    0.55
     Franciscan
    0.54
    Liberal
    0.54
    ONU
    0.54
     Lutheran
    0.52
     ONU
    0.52
    Biology
    0.50
    Act Density 0.003%

    No Known Activations