INDEX
Explanations
references to food items, especially those related to chocolate and caramel
references to specific brands and types of food products, particularly sweets and snacks
New Auto-Interp
Negative Logits
NN
-0.70
ales
-0.69
xy
-0.68
alt
-0.67
ĩ
-0.67
Jinping
-0.66
lar
-0.64
^^^^
-0.64
Builder
-0.64
printf
-0.64
POSITIVE LOGITS
âĢ¢âĢ¢âĢ¢âĢ¢
0.88
istically
0.80
mental
0.77
Tasman
0.76
istic
0.76
amel
0.73
pelled
0.72
ciation
0.68
arette
0.67
drib
0.66
Activations Density 0.045%