INDEX
Explanations
numerical values and percentages
New Auto-Interp
Negative Logits
oen
-0.16
urum
-0.14
doi
-0.14
gne
-0.14
alom
-0.14
γκ
-0.14
294
-0.13
orch
-0.13
sdale
-0.13
aneously
-0.13
POSITIVE LOGITS
olor
0.15
atab
0.14
baz
0.14
agit
0.14
INI
0.14
edor
0.13
count
0.13
oses
0.13
mor
0.13
positor
0.13
Activations Density 0.007%