INDEX
Explanations
references to lists or catalog entries
New Auto-Interp
Negative Logits
dent
-0.18
zem
-0.17
endo
-0.15
enstein
-0.15
pest
-0.15
idor
-0.15
iland
-0.14
dff
-0.14
ifa
-0.14
ollen
-0.14
POSITIVE LOGITS
ade
0.45
ad
0.31
ADE
0.25
rade
0.24
ades
0.23
ад
0.22
ande
0.20
ada
0.20
ade
0.20
aded
0.19
Activations Density 0.011%