INDEX
Explanations
comparative phrases indicating differences between subjects or entities
New Auto-Interp
Negative Logits
aos
-0.18
}elseif
-0.15
åĬ¡
-0.15
iazza
-0.15
ipel
-0.14
iaz
-0.14
cke
-0.14
mma
-0.13
¡
-0.13
iah
-0.13
POSITIVE LOGITS
rod
0.16
Rod
0.15
ingleton
0.15
rod
0.15
other
0.15
MATRIX
0.15
other
0.15
sarc
0.15
arent
0.14
others
0.14
Activations Density 0.163%