INDEX
Explanations
phrases indicating academic or institutional affiliations
college of [field]
New Auto-Interp
Negative Logits
dol
-0.42
مقد
-0.42
scre
-0.41
Svar
-0.40
...
-0.38
<eos>
-0.38
Sector
-0.36
Sozial
-0.35
Ø
-0.35
('-0.35
POSITIVE LOGITS
College
0.94
College
0.85
COLLEGE
0.82
college
0.80
college
0.78
COLLEGE
0.74
sizeCache
0.71
OGND
0.70
colleges
0.68
Colleges
0.68
Activations Density 0.005%