INDEX
Explanations
words related to causing action or response
phrases or words indicating triggers for actions or responses
New Auto-Interp
Negative Logits
atum
-0.85
çĦ
-0.79
wn
-0.75
oyal
-0.73
nect
-0.72
Í
-0.71
framework
-0.71
thin
-0.70
ãĥİ
-0.70
illusion
-0.69
POSITIVE LOGITS
outcry
1.12
inquiries
1.00
warnings
0.98
widespread
0.96
speculation
0.96
complaints
0.95
questions
0.92
accusations
0.91
cries
0.91
condemnation
0.91
Activations Density 0.074%