INDEX
Explanations
numerical values or identifiers within the text
New Auto-Interp
Negative Logits
kili
-0.16
newInstance
-0.16
lier
-0.16
omik
-0.15
ilm
-0.15
rsa
-0.14
afort
-0.14
bew
-0.14
گر
-0.14
imli
-0.14
POSITIVE LOGITS
fried
0.15
undisclosed
0.15
ritz
0.15
Fritz
0.15
chan
0.15
agle
0.14
.fake
0.14
etical
0.14
th
0.14
Weiss
0.13
Activations Density 0.214%