INDEX
Explanations
phrases involving an unspecified group or entities in a negative context
references to the word "some"
New Auto-Interp
Negative Logits
ocene
-0.76
tainment
-0.69
enance
-0.65
ettes
-0.64
gon
-0.64
sburgh
-0.63
borne
-0.63
raid
-0.63
uckle
-0.62
meet
-0.62
POSITIVE LOGITS
ones
1.03
place
1.03
semblance
0.93
body
0.93
how
0.92
unspecified
0.82
HOW
0.73
mornings
0.70
indist
0.70
degree
0.69
Activations Density 0.117%