INDEX
Explanations
references to military positions or high-ranking officials
repeated phrases and references to titles, particularly involving 'of'
New Auto-Interp
Negative Logits
yip
-0.76
izable
-0.70
enos
-0.68
rish
-0.64
hunter
-0.64
icans
-0.60
nih
-0.60
hunt
-0.59
rative
-0.59
xon
-0.58
POSITIVE LOGITS
Circle
0.65
®
0.63
pton
0.62
Pyth
0.61
edIn
0.60
aiden
0.60
Circ
0.59
Gaal
0.59
pez
0.58
Connector
0.58
Activations Density 0.145%