INDEX
Explanations
references to kingdoms or royalty
references to various kingdoms
New Auto-Interp
Negative Logits
CAST
-0.80
esters
-0.76
eman
-0.73
matter
-0.72
inez
-0.72
pos
-0.71
lder
-0.70
eport
-0.68
Ñı
-0.65
gotten
-0.64
POSITIVE LOGITS
DOM
1.09
Hearts
1.06
Arabian
0.92
wide
0.91
doms
0.85
Halls
0.83
Arabia
0.82
loo
0.79
kingdom
0.76
Aram
0.76
Activations Density 0.019%