INDEX
Explanations
mentions of state universities, particularly "State" followed by a number
New Auto-Interp
Negative Logits
istani
-0.17
inski
-0.17
stal
-0.16
erty
-0.15
ensch
-0.14
171
-0.14
erties
-0.14
eration
-0.14
mand
-0.14
eras
-0.14
POSITIVE LOGITS
University
0.15
olia
0.15
yla
0.14
-issue
0.14
isas
0.14
rokes
0.14
UNIVERSITY
0.14
ampo
0.14
èĪ
0.13
èIJ¥
0.13
Activations Density 0.008%