INDEX
Explanations
references to different types of food, particularly pizza and sandwich-related terms
New Auto-Interp
Negative Logits
Neal
-0.16
lessly
-0.15
asi
-0.15
ABA
-0.14
IPA
-0.14
Coffee
-0.14
Neal
-0.14
coffee
-0.14
Fol
-0.14
rum
-0.14
POSITIVE LOGITS
joint
0.26
joint
0.24
joints
0.24
Joint
0.24
Joint
0.23
Parl
0.20
_joint
0.18
shop
0.18
maker
0.18
parl
0.18
Activations Density 0.076%