INDEX
Explanations
names with a specific structure: two short parts separated by a dash and potentially followed by other characters
references to specific locations or properties associated with notable individuals
New Auto-Interp
Negative Logits
Extras
-0.71
LW
-0.67
Grimes
-0.66
Chambers
-0.64
imore
-0.64
Flavoring
-0.64
kefeller
-0.63
foregoing
-0.63
Vessel
-0.61
DragonMagazine
-0.60
POSITIVE LOGITS
shaped
1.02
nom
0.96
sized
0.90
sama
0.89
san
0.89
chan
0.89
cell
0.88
cert
0.87
inf
0.86
induced
0.86
Activations Density 0.135%