INDEX
Explanations
proper nouns related to royalty
mentions of royalty or princes
New Auto-Interp
Negative Logits
selves
-0.74
anwhile
-0.69
onica
-0.66
fram
-0.66
KT
-0.66
bers
-0.64
£ı
-0.63
RD
-0.62
visors
-0.61
ãĤī
-0.60
POSITIVE LOGITS
loo
0.90
cipled
0.86
doms
0.84
Clause
0.79
Rupert
0.77
Prince
0.77
pin
0.76
Albert
0.73
afort
0.72
Prince
0.71
Activations Density 0.020%