INDEX
Explanations
phrases related to actions and consequences
concepts related to well-being and social responsibility
New Auto-Interp
Negative Logits
disadvant
-0.52
Sutherland
-0.51
Azerb
-0.50
DonaldTrump
-0.50
Aval
-0.47
Seym
-0.47
ibrary
-0.46
yip
-0.46
gerald
-0.45
Moroc
-0.44
POSITIVE LOGITS
thereof
1.08
thereto
0.97
thereafter
0.95
alike
0.91
accordingly
0.91
therein
0.81
versa
0.69
cellaneous
0.62
afterward
0.62
afterwards
0.62
Activations Density 1.994%