INDEX
Explanations
references to dining or eating experiences
New Auto-Interp
Negative Logits
!*\
-0.62
selaer
-0.57
\}\\
-0.54
tenberg
-0.53
ITUTE
-0.53
achable
-0.53
ollectionView
-0.52
allo
-0.50
StructEnd
-0.50
Representative
-0.49
POSITIVE LOGITS
dining
2.66
Dining
2.45
dining
2.36
Dining
2.28
dine
1.67
dined
1.44
din
1.39
Dine
1.31
Din
1.24
Dine
1.24
Activations Density 0.092%