INDEX
Explanations
names, specifically those with the sequence "na" with varying levels of specificity
occurrences of the substring "na" within words
New Auto-Interp
Negative Logits
ienced
-0.81
neys
-0.79
======
-0.71
tails
-0.71
ansas
-0.70
raved
-0.69
loo
-0.66
layer
-0.66
birds
-0.64
wolves
-0.62
POSITIVE LOGITS
eus
1.24
uthor
1.19
vel
0.92
ples
0.91
isance
0.90
ACP
0.87
ïve
0.86
emi
0.85
veland
0.81
ñ
0.80
Activations Density 0.033%