INDEX
Explanations
references to fortifications or places considered secure or heavily protected
terms associated with fortified positions or strong defensive locations
New Auto-Interp
Negative Logits
hy
-0.83
ordan
-0.82
andro
-0.78
respond
-0.75
ivan
-0.74
apers
-0.74
matter
-0.73
ensation
-0.71
videos
-0.70
uci
-0.70
POSITIVE LOGITS
stronghold
1.41
strongh
1.32
bast
0.86
elector
0.82
advoc
0.81
stead
0.78
urst
0.78
rongh
0.76
tradem
0.75
enclave
0.73
Activations Density 0.012%