INDEX
Explanations
names that contain "ha" with varying levels of activation, potentially indicating a preference for a specific name or concept
repeated occurrences of the substring "ha"
New Auto-Interp
Negative Logits
papers
-0.87
atories
-0.76
rations
-0.72
ateur
-0.70
lace
-0.67
enhagen
-0.66
entric
-0.65
largeDownload
-0.64
lines
-0.64
parts
-0.62
POSITIVE LOGITS
wn
1.10
iku
0.96
ha
0.94
pless
0.92
ichi
0.90
qua
0.90
jj
0.89
fter
0.85
pton
0.84
pp
0.84
Activations Density 0.010%