INDEX
Explanations
references to entities or groups, particularly related to demographics or categories
New Auto-Interp
Negative Logits
utils
-0.69
nings
-0.64
atoes
-0.62
few
-0.62
pse
-0.62
guiName
-0.62
arde
-0.61
preparations
-0.60
taker
-0.60
xxxxxxxx
-0.60
POSITIVE LOGITS
differing
1.01
varying
0.93
color
0.89
colour
0.82
various
0.81
different
0.79
diverse
0.78
other
0.77
varied
0.76
Colour
0.73
Activations Density 0.288%