INDEX
Explanations
references to academic roles, research funding, and scholarships
New Auto-Interp
Negative Logits
fiber
-0.18
labor
-0.17
behavioral
-0.16
honored
-0.16
theater
-0.16
analyzed
-0.16
neighbors
-0.15
honors
-0.15
counseling
-0.15
fibers
-0.15
POSITIVE LOGITS
EPS
0.27
EPS
0.25
Outputs
0.23
UK
0.21
UK
0.21
Norwich
0.20
outputs
0.20
outputs
0.19
Imperial
0.19
Outputs
0.19
Activations Density 0.117%