INDEX
Explanations
proper nouns and names, specifically with the substring "ans" frequently appearing in the activations
instances of the word "Fr" or variations thereof related to locations, specifically San Francisco
New Auto-Interp
Negative Logits
Jinn
-0.80
izu
-0.68
Izan
-0.66
Contrast
-0.64
Siren
-0.64
Archdemon
-0.63
Redd
-0.63
atsu
-0.62
Lith
-0.61
Khe
-0.61
POSITIVE LOGITS
isco
0.98
ruary
0.79
ulent
0.78
atism
0.76
rance
0.74
nce
0.70
furt
0.69
fur
0.67
acies
0.67
ateurs
0.65
Activations Density 0.083%