INDEX
Explanations
mentions of speeches
references to speeches
New Auto-Interp
Negative Logits
Soldiers
-0.65
Associ
-0.65
rome
-0.62
involved
-0.59
Mississ
-0.58
saline
-0.57
Username
-0.56
Coast
-0.56
Users
-0.56
usercontent
-0.56
POSITIVE LOGITS
writers
1.02
writer
1.01
speeches
0.90
writing
0.80
speech
0.80
delivered
0.77
denouncing
0.77
imped
0.76
anooga
0.76
speech
0.75
Activations Density 0.043%