INDEX
Explanations
words related to protection, defense, and security
New Auto-Interp
Negative Logits
hall
-0.70
jet
-0.67
LINE
-0.64
LIN
-0.63
lore
-0.63
hler
-0.61
lins
-0.61
zos
-0.59
hyp
-0.57
Minutes
-0.57
POSITIVE LOGITS
against
1.21
against
1.10
iveness
1.09
ively
1.03
Against
0.98
ously
0.97
atively
0.96
folios
0.92
orate
0.91
ailability
0.88
Activations Density 0.796%