INDEX
Explanations
directional instructions and spatial relationships
references to direction and gender distinctions
New Auto-Interp
Negative Logits
edIn
-0.65
conservancy
-0.65
odka
-0.64
abo
-0.62
allo
-0.61
LES
-0.61
©¶æ¥µ
-0.60
NRS
-0.60
ivated
-0.58
sclerosis
-0.57
POSITIVE LOGITS
versus
1.25
vs
1.18
/-
0.99
-,
0.92
or
0.90
/
0.90
and
0.89
AND
0.84
->
0.83
->
0.82
Activations Density 0.473%