INDEX
Explanations
The neuron fires on mentions of “room” (and its variants) and associated room‐detail tokens in hotel reviews.
New Auto-Interp
Negative Logits
tracks
-0.07
bindings
-0.07
isim
-0.06
_MANAGER
-0.06
editors
-0.06
Constraints
-0.06
IAM
-0.06
typeName
-0.06
hoàng
-0.06
шло
-0.06
POSITIVE LOGITS
unins
0.07
geme
0.07
AUT
0.07
conventional
0.07
Wichita
0.07
Zy
0.07
Geh
0.07
suất
0.06
裡
0.06
miserable
0.06
Activations Density 0.009%