INDEX
Explanations
phrases related to warnings or negative consequences
elements related to quantity or numerical values associated with various contexts
New Auto-Interp
Negative Logits
¥ŀ
-0.64
obo
-0.60
Fat
-0.60
agon
-0.58
Port
-0.58
uder
-0.58
Ton
-0.57
Mor
-0.56
OB
-0.56
mitter
-0.55
POSITIVE LOGITS
and
0.98
and
0.98
AND
0.87
&
0.78
&
0.76
andi
0.74
andan
0.72
soType
0.72
partName
0.70
ands
0.67
Activations Density 0.156%