INDEX
Explanations
proper nouns related to characters or places
specific proper nouns and references to food, particularly broccoli
New Auto-Interp
Negative Logits
arning
-0.86
rer
-0.81
achment
-0.77
Cheong
-0.70
Doodle
-0.69
oppable
-0.69
antis
-0.69
igree
-0.69
iple
-0.68
gol
-0.67
POSITIVE LOGITS
Bran
0.97
Stark
0.78
burner
0.72
illary
0.72
Beng
0.71
helm
0.71
externalActionCode
0.71
dylib
0.67
cabbage
0.66
é»Ĵ
0.66
Activations Density 0.023%