INDEX
Explanations
key numeric and statistical relationships or representations in the text
New Auto-Interp
Negative Logits
apor
-0.18
amarin
-0.16
izu
-0.16
,readonly
-0.15
addock
-0.15
ufe
-0.14
ادÙĦ
-0.14
onden
-0.14
imar
-0.13
umat
-0.13
POSITIVE LOGITS
arks
0.17
sembler
0.16
fos
0.14
še
0.14
rollers
0.14
sprites
0.14
ystore
0.14
Tro
0.14
rac
0.14
tro
0.13
Activations Density 0.061%