INDEX
Explanations
phrases containing the character '&' followed by other characters
conjunctions and phrases indicating connections between ideas or entities
New Auto-Interp
Negative Logits
20439
-0.81
bably
-0.77
etheless
-0.75
awaru
-0.73
thodox
-0.70
ivalent
-0.66
yip
-0.64
Canaver
-0.62
theless
-0.61
unusually
-0.59
POSITIVE LOGITS
&
3.27
&
2.08
(&
1.77
et
1.17
etc
1.16
Vs
1.16
(&
1.12
vs
1.12
+
1.10
AND
1.07
Activations Density 0.021%