INDEX
Explanations
The neuron is looking for mentions of legal terms and institutions
the letter 'a' in various contexts throughout the document
New Auto-Interp
Negative Logits
lasses
-0.74
ARGET
-0.70
ymm
-0.68
ONSORED
-0.66
<[
-0.65
inances
-0.65
pter
-0.64
WATCHED
-0.64
rored
-0.64
onel
-0.62
POSITIVE LOGITS
ñ
1.15
qua
0.97
ð
0.96
ption
0.95
ña
0.92
BILITY
0.91
vel
0.90
ichi
0.89
emia
0.89
ishi
0.87
Activations Density 0.071%