INDEX
Explanations
mentions of academic achievements, especially graduating from schools or universities
references to the act of graduating or completion of education
New Auto-Interp
Negative Logits
erness
-0.87
bro
-0.71
hed
-0.68
pers
-0.67
Reprodu
-0.65
eur
-0.65
otin
-0.62
Twist
-0.61
mob
-0.61
ger
-0.60
POSITIVE LOGITS
uates
0.90
SCHOOL
0.87
college
0.86
uating
0.83
College
0.83
graduation
0.81
ploma
0.79
icient
0.75
utes
0.73
icut
0.73
Activations Density 0.029%