INDEX
Explanations
phrases that define foundational concepts or principles
New Auto-Interp
Negative Logits
tail
-0.20
ish
-0.19
oit
-0.19
лев
-0.17
esi
-0.17
irsch
-0.17
aise
-0.16
outs
-0.16
alim
-0.15
sdale
-0.15
POSITIVE LOGITS
ëŀ
0.17
most
0.16
conds
0.16
dır
0.16
глÑıд
0.15
curity
0.15
paring
0.15
nut
0.15
croll
0.15
yal
0.15
Activations Density 0.022%