INDEX
Explanations
`ade` followed by `cre` or `class`
New Auto-Interp
Negative Logits
un
0.93
as
0.90
o
0.88
il
0.79
ul
0.79
n
0.76
um
0.75
not
0.72
ate
0.68
ov
0.67
POSITIVE LOGITS
ורי
0.69
ков
0.68
outstretched
0.65
evaded
0.63
ні
0.63
ким
0.61
קד
0.60
ковского
0.59
شي
0.59
кин
0.59
Activations Density 0.000%