INDEX
Explanations
references to food-related categories and their connections in context
Follows the word "and"
and followed by categories
New Auto-Interp
Negative Logits
...
-0.48
}}}}
-0.48
the
-0.47
--
-0.47
</i>
-0.46
o
-0.46
med
-0.45
Group
-0.45
King
-0.44
li
-0.44
POSITIVE LOGITS
pleaſure
0.88
Jefus
0.84
GenerationType
0.83
Monfieur
0.79
ſche
0.77
raiſ
0.76
auffi
0.72
endphp
0.71
IntoConstraints
0.71
myſelf
0.71
Activations Density 0.427%