INDEX
Explanations
This neuron flags words appearing as quoted titles or names—that is, text inside quotation marks.
New Auto-Interp
Negative Logits
confirmation
-0.07
incip
-0.06
/type
-0.06
текущ
-0.06
belum
-0.06
animation
-0.06
Girls
-0.06
سان
-0.06
-animation
-0.06
olik
-0.06
POSITIVE LOGITS
пу
0.08
.Merge
0.06
정
0.06
)();↵
0.06
hill
0.06
кім
0.06
={[0.06
compensate
0.06
аром
0.06
(delete
0.06
Activations Density 0.109%