INDEX
Explanations
names of people's initials
proper nouns, specifically names and initials associated with people or entities
New Auto-Interp
Negative Logits
constitu
-0.68
oslav
-0.65
veins
-0.64
glers
-0.61
é¾įåĸļ士
-0.60
Flavoring
-0.60
resid
-0.59
runes
-0.59
UFF
-0.59
enthus
-0.57
POSITIVE LOGITS
rama
0.79
enment
0.76
amins
0.75
ij士
0.73
ente
0.72
asaki
0.72
heid
0.71
zeb
0.71
gui
0.70
amina
0.69
Activations Density 0.062%