INDEX
Explanations
phrases indicating composition or quantity of entities or items
New Auto-Interp
Negative Logits
Sport
-0.72
AIDS
-0.70
ira
-0.69
soType
-0.66
dad
-0.66
inition
-0.64
bringer
-0.63
pal
-0.63
elsius
-0.62
traged
-0.62
POSITIVE LOGITS
several
0.89
three
0.87
four
0.85
five
0.83
varying
0.82
multiple
0.82
two
0.81
seven
0.81
dots
0.80
six
0.79
Activations Density 0.040%