INDEX
Explanations
words related to North Korea
New Auto-Interp
Negative Logits
rative
-0.73
ulous
-0.69
imental
-0.68
ername
-0.67
ration
-0.67
FAULT
-0.67
ATURE
-0.67
TED
-0.66
NRS
-0.66
ILA
-0.66
POSITIVE LOGITS
ampton
1.20
Carolina
1.14
Korea
1.10
western
1.04
Pole
1.03
Koreans
1.02
Dakota
1.01
shore
0.93
umber
0.93
Korean
0.92
Activations Density 0.253%