INDEX
Explanations
phrases related to social issues and ideologies
indications of causality or significant changes in discussions
New Auto-Interp
Negative Logits
ilar
-0.78
noticed
-0.66
USS
-0.65
ebted
-0.63
regretted
-0.63
ONDON
-0.61
OA
-0.60
hairs
-0.58
Sus
-0.58
ambitious
-0.58
POSITIVE LOGITS
ALWAYS
0.83
abound
0.83
flourish
0.82
dictate
0.82
everywhere
0.80
perv
0.78
trump
0.77
always
0.73
democrat
0.72
always
0.70
Activations Density 0.424%