INDEX
Explanations
references to military personnel and their training
New Auto-Interp
Negative Logits
reconstruct
-0.14
(!!
-0.14
Brennan
-0.14
лив
-0.14
.appspot
-0.14
army
-0.14
shell
-0.14
çͲ
-0.14
kip
-0.14
lim
-0.14
POSITIVE LOGITS
Air
0.28
Air
0.25
wing
0.24
AF
0.23
Wing
0.23
wings
0.22
AF
0.21
afb
0.21
air
0.20
.af
0.20
Activations Density 0.050%