INDEX
Explanations
phrases containing the word "usual" followed by other descriptors
references to common or typical patterns and behaviors
New Auto-Interp
Negative Logits
acus
-0.81
zon
-0.79
vic
-0.75
rea
-0.72
mented
-0.70
wic
-0.70
<?
-0.69
hani
-0.66
justice
-0.63
Darius
-0.63
POSITIVE LOGITS
disclaimer
0.86
caveat
0.83
caveats
0.77
tendency
0.72
assumption
0.72
picture
0.71
amount
0.71
assortment
0.70
thing
0.70
"+
0.69
Activations Density 0.106%