INDEX
Explanations
references to the World Trade Center and related locations or events
New Auto-Interp
Negative Logits
igos
-0.16
ondon
-0.15
ichten
-0.15
aight
-0.15
abel
-0.15
ego
-0.15
Trigger
-0.14
rooms
-0.14
iration
-0.14
Trigger
-0.14
POSITIVE LOGITS
diff
0.17
овоÑĢ
0.15
енÑĮ
0.15
ombine
0.15
Conc
0.15
infeld
0.14
conc
0.14
owi
0.14
conc
0.14
herd
0.14
Activations Density 0.007%