INDEX
Explanations
references to historical events, particularly related to World War II
New Auto-Interp
Negative Logits
Valentine
-0.15
dern
-0.15
-0.15
ainter
-0.15
æĭ
-0.15
Ze
-0.14
optic
-0.14
torpedo
-0.14
Joseph
-0.13
teri
-0.13
POSITIVE LOGITS
Norm
0.46
Norm
0.38
norm
0.32
landing
0.31
Landing
0.28
landing
0.27
norm
0.26
(norm
0.25
.norm
0.25
Norman
0.23
Activations Density 0.023%