INDEX
Explanations
names with the pattern "ker" followed by a number ranging from 5 to 10
proper nouns, specifically names of people
New Auto-Interp
Negative Logits
Cind
-0.68
ashtra
-0.68
ental
-0.66
Reviewer
-0.66
icist
-0.64
ollah
-0.63
constitu
-0.63
Palestin
-0.60
ella
-0.58
querque
-0.58
POSITIVE LOGITS
chief
1.01
haw
0.92
luster
0.91
wagen
0.90
ker
0.89
kie
0.84
crew
0.80
rieg
0.79
lar
0.78
rish
0.76
Activations Density 0.043%