INDEX
Explanations
words related to cooking and food preparation
actions taking place in specific scenarios
New Auto-Interp
Negative Logits
Fact
-0.76
Osw
-0.70
clinton
-0.69
paralle
-0.69
avier
-0.69
ormon
-0.68
arton
-0.68
particularly
-0.68
arij
-0.68
Correct
-0.67
POSITIVE LOGITS
oblivious
1.02
hordes
0.98
screaming
0.98
unsuspecting
0.95
endless
0.94
endlessly
0.90
goddamn
0.90
drunken
0.89
waving
0.89
dudes
0.89
Activations Density 0.771%