INDEX
Explanations
personal references and possessive pronouns
New Auto-Interp
Negative Logits
roscope
-0.13
zew
-0.12
Jones
-0.12
newline
-0.12
prefixes
-0.12
minus
-0.11
prites
-0.11
pillars
-0.11
purposely
-0.11
Jones
-0.11
POSITIVE LOGITS
Number
0.68
number
0.68
Number
0.58
number
0.57
NUMBER
0.54
numero
0.53
número
0.51
_number
0.50
Numero
0.48
-number
0.47
Activations Density 0.086%