INDEX
Explanations
the term "mon" or variations related to "monomer" or "monopole"
New Auto-Interp
Negative Logits
featureID
-0.81
itſelf
-0.81
<unused41>
-0.80
<unused3>
-0.80
<unused42>
-0.80
<unused43>
-0.79
<unused47>
-0.79
<unused74>
-0.79
<unused51>
-0.79
[@BOS@]
-0.79
POSITIVE LOGITS
ute
0.57
further
0.56
imp
0.56
mon
0.54
tailored
0.52
shot
0.47
tailor
0.47
tail
0.46
=
0.43
TEXTURE
0.42
Activations Density 0.366%