INDEX
Explanations
references to educational institutions and award-related events
New Auto-Interp
Negative Logits
unca
-0.16
utow
-0.15
onta
-0.14
abar
-0.14
eil
-0.14
715
-0.14
725
-0.14
jin
-0.13
anzi
-0.13
Testament
-0.13
POSITIVE LOGITS
Arlington
0.28
Fairfax
0.27
703
0.24
Alexandria
0.23
Rest
0.20
Mason
0.20
Tyson
0.18
Shir
0.18
Alexand
0.18
Loud
0.18
Activations Density 0.026%