INDEX
Explanations
phrases related to assumptions
references to societal or cultural assumptions
New Auto-Interp
Negative Logits
Interstitial
-0.99
sung
-0.88
thumbnails
-0.80
CLASSIFIED
-0.77
waters
-0.77
ansk
-0.74
oho
-0.73
dos
-0.71
ateurs
-0.70
HCR
-0.70
POSITIVE LOGITS
assumptions
1.07
assumption
1.00
premise
0.86
underpin
0.85
biases
0.81
presupp
0.78
disclaimer
0.74
assumes
0.71
underlying
0.69
lessly
0.68
Activations Density 0.025%