INDEX
Negative Logits
UCLA
0.60
Stanford
0.60
stanford
0.60
Stanford
0.58
Harvard
0.53
Yale
0.52
Harvard
0.50
Yale
0.46
berkeley
0.46
taobao
0.45
POSITIVE LOGITS
Liberal
0.60
Dominican
0.59
liberal
0.55
Biology
0.55
Franciscan
0.54
Liberal
0.54
ONU
0.54
Lutheran
0.52
ONU
0.52
Biology
0.50
Activations Density 0.003%