INDEX
Explanations
college and academic subjects
New Auto-Interp
Negative Logits
า
0.80
ين
0.77
ুল
0.73
ва
0.69
ا
0.69
ק
0.67
די
0.67
یم
0.61
ु
0.59
ुल
0.58
POSITIVE LOGITS
College
0.81
-
0.75
College
0.75
COLLEGE
0.72
college
0.67
college
0.66
/
0.63
t
0.61
you
0.59
stance
0.58
Activations Density 0.004%