INDEX
Explanations
phrases related to beliefs, assumptions, and theories
words related to assumptions and beliefs
New Auto-Interp
Negative Logits
Interstitial
-0.83
sung
-0.75
waters
-0.71
odes
-0.70
thumbnails
-0.69
ateurs
-0.67
oho
-0.65
avid
-0.65
umen
-0.63
HCR
-0.63
POSITIVE LOGITS
assumptions
1.00
assumption
0.92
underpin
0.80
staking
0.75
assumes
0.73
eers
0.68
presupp
0.67
biases
0.66
premise
0.66
arily
0.65
Activations Density 0.015%