INDEX
Explanations
lunch followed by parenthesis
New Auto-Interp
Negative Logits
t
0.91
)
0.70
With
0.64
AB
0.63
d
0.63
_
0.63
llä
0.61
an
0.61
ן
0.61
User
0.60
POSITIVE LOGITS
ра
1.04
at
1.02
ла
0.90
ме
0.83
lunches
0.81
lunch
0.80
ور
0.79
не
0.78
Dinner
0.77
ний
0.76
Activations Density 0.012%