INDEX
Explanations
directional and positional language
New Auto-Interp
Negative Logits
pis
-0.17
.writeln
-0.16
utsch
-0.16
ÙĪØµ
-0.15
reh
-0.15
dob
-0.15
pha
-0.14
_FF
-0.14
ixo
-0.14
go
-0.13
POSITIVE LOGITS
ward
0.17
chemas
0.16
zung
0.16
wards
0.15
.scalablytyped
0.14
Aph
0.14
WARD
0.14
hattan
0.14
åħī
0.14
415
0.14
Activations Density 0.073%