INDEX
Explanations
entities and organizations related to social issues and sciences
New Auto-Interp
Negative Logits
ntag
-0.18
Nat
-0.18
Nad
-0.18
Nx
-0.17
neh
-0.17
Nat
-0.17
nodoc
-0.17
ondo
-0.17
NX
-0.17
enne
-0.17
POSITIVE LOGITS
ch
0.23
Ch
0.19
Wich
0.18
ch
0.17
chi
0.16
.ch
0.16
ycl
0.16
sh
0.16
Ch
0.15
ich
0.15
Activations Density 0.066%