INDEX
Explanations
mentions of the word "penguins"
references to penguins
New Auto-Interp
Negative Logits
WORK
-0.82
ysis
-0.80
puter
-0.73
phas
-0.71
usted
-0.66
¿½
-0.66
ILCS
-0.65
neau
-0.65
parts
-0.65
lder
-0.64
POSITIVE LOGITS
insula
0.99
Penguins
0.86
pengu
0.82
aukee
0.77
atoon
0.77
engu
0.74
keye
0.72
Pengu
0.72
unia
0.71
Hots
0.67
Activations Density 0.025%