INDEX
Explanations
phrases indicating obstructions or barriers
New Auto-Interp
Negative Logits
incinn
-0.83
ocally
-0.83
aughed
-0.82
iosyncr
-0.81
milo
-0.81
akespe
-0.75
ocry
-0.74
uesday
-0.69
irted
-0.69
olate
-0.68
POSITIVE LOGITS
thereof
0.77
finding
0.76
liness
0.76
of
0.69
ward
0.66
conversions
0.65
lihood
0.64
liest
0.63
.--
0.62
WARD
0.60
Activations Density 0.009%