INDEX
Explanations
references to the physical manipulation and arrangement of objects or materials
New Auto-Interp
Negative Logits
εÏĦ
-0.14
款
-0.14
rve
-0.13
feit
-0.13
าà¸Ī
-0.13
ussen
-0.13
pedo
-0.13
ãĢģãģĿãģĨ
-0.13
JT
-0.13
ινÏĮ
-0.13
POSITIVE LOGITS
each
0.28
one
0.28
top
0.22
itself
0.22
opposite
0.20
each
0.20
top
0.19
strategic
0.19
where
0.19
either
0.19
Activations Density 0.381%