INDEX
Explanations
patterns of comparison and contrasts in statements
New Auto-Interp
Negative Logits
urma
-0.15
541
-0.14
Lâm
-0.14
chet
-0.14
kinson
-0.14
owa
-0.14
Threshold
-0.14
urf
-0.14
à¸ļà¸ļ
-0.14
anywhere
-0.14
POSITIVE LOGITS
uant
0.17
Salir
0.16
erin
0.16
erie
0.14
ged
0.14
dc
0.13
unci
0.13
bad
0.13
ania
0.13
Estr
0.13
Activations Density 0.115%