INDEX
Explanations
instances of the words "hot" and "warmed" related to topics of health or cooking
New Auto-Interp
Negative Logits
Downs
-0.16
Deutsch
-0.15
alth
-0.15
Poke
-0.15
olume
-0.13
uddy
-0.13
Adds
-0.13
oundary
-0.13
tul
-0.13
æ
-0.13
POSITIVE LOGITS
êµ´
0.15
bsd
0.15
VRTX
0.15
ë¡Ŀ
0.15
lique
0.14
RelativeTo
0.14
ì£
0.14
ÙĨس
0.14
trinsic
0.14
vÃŃc
0.14
Activations Density 0.006%