INDEX
Explanations
references to eating and restaurants
New Auto-Interp
Negative Logits
']))
-0.92
}}],
-0.84
}));
-0.79
'])){
-0.77
hii
-0.73
Nilsson
-0.70
]))
-0.68
arqu
-0.68
mij
-0.68
])));
-0.66
POSITIVE LOGITS
eat
2.52
eats
2.24
Eat
2.22
EAT
2.21
eating
2.19
eaten
2.15
Eat
2.08
Eating
2.03
Eating
2.02
ate
1.98
Activations Density 0.031%