INDEX
Explanations
academic disciplines and fields of study
New Auto-Interp
Negative Logits
leta
-0.16
ross
-0.16
anned
-0.15
rone
-0.15
agon
-0.15
ä¸Ģç·Ĵ
-0.15
imary
-0.14
sert
-0.14
è¨Ģãģ£ãģŁ
-0.14
ilar
-0.14
POSITIVE LOGITS
magna
0.19
minor
0.19
601
0.17
607
0.17
251
0.17
uga
0.16
degree
0.16
theory
0.16
followed
0.16
degrees
0.15
Activations Density 0.042%