INDEX
Explanations
phrases that express comparisons and contrasts between different concepts or entities
New Auto-Interp
Negative Logits
Andersen
-0.76
aeda
-0.74
enhagen
-0.73
ilogy
-0.72
akuya
-0.71
PsyNet
-0.70
ocument
-0.70
document
-0.70
\\\\\\\\
-0.69
elsen
-0.67
POSITIVE LOGITS
Medium
1.48
medium
1.43
Medium
1.36
low
1.19
medium
1.18
shallow
1.13
small
1.13
Low
1.10
Low
1.09
low
1.08
Activations Density 3.358%