INDEX
Explanations
phrases related to specific names or proper nouns, particularly "Ben" or variations of it
proper nouns, specifically names and titles
New Auto-Interp
Negative Logits
oslav
-0.79
FORMATION
-0.67
Interested
-0.65
VIDE
-0.65
olor
-0.64
veins
-0.64
ngth
-0.63
é¾įåĸļ士
-0.63
glers
-0.62
constitu
-0.61
POSITIVE LOGITS
rama
0.79
heid
0.72
gui
0.71
ilot
0.71
ente
0.70
aline
0.69
fen
0.67
ij士
0.66
zeb
0.66
alion
0.66
Activations Density 0.071%