INDEX
Explanations
the followed by capitalized words
New Auto-Interp
Negative Logits
napkins
0.67
ці
0.67
без
0.66
piercings
0.62
<unused702>
0.62
சரியான
0.61
исти
0.60
сме
0.60
ст
0.60
ഹ്ലാദ
0.59
POSITIVE LOGITS
la
0.60
Om
0.57
landet
0.55
ln
0.55
rodzaj
0.55
ุ
0.54
Older
0.53
t
0.53
td
0.53
the
0.52
Activations Density 0.705%