INDEX
Explanations
references to military-related terms
terms related to military actions or organizations
New Auto-Interp
Negative Logits
̶
-0.77
posure
-0.75
nder
-0.73
Charge
-0.69
Mist
-0.68
Franken
-0.67
Springs
-0.65
title
-0.64
missions
-0.64
PER
-0.64
POSITIVE LOGITS
milit
1.29
reluct
0.98
arily
0.94
ament
0.85
guiActiveUn
0.84
ilitary
0.83
untled
0.83
fatig
0.82
heast
0.76
diseng
0.76
Activations Density 0.002%