INDEX
Explanations
references to the 9/11 terrorist attacks
New Auto-Interp
Negative Logits
iations
-0.86
iating
-0.83
itably
-0.82
pmwiki
-0.81
iates
-0.76
ividual
-0.75
Uriel
-0.75
itable
-0.75
iated
-0.74
ively
-0.73
POSITIVE LOGITS
9999
0.99
999
0.90
ãĤ§
0.86
borough
0.82
090
0.81
08
0.78
06
0.77
ãĥ¼ãĥ³
0.75
masters
0.74
ãĤ©
0.72
Activations Density 2.711%