INDEX
Explanations
mentions of public figures and their actions or statements
instances of the word "the."
New Auto-Interp
Negative Logits
strom
-0.86
thouse
-0.83
icone
-0.78
Layer
-0.77
strap
-0.77
agree
-0.76
SPONSORED
-0.76
hari
-0.72
breathed
-0.70
gel
-0.70
POSITIVE LOGITS
importance
1.42
possibility
1.41
dangers
1.39
merits
1.28
impending
1.27
topic
1.21
evils
1.19
origins
1.17
plight
1.16
significance
1.15
Activations Density 0.336%