INDEX
Explanations
The neuron is specifically searching for words related to tear gas
references to tears and emotional distress
New Auto-Interp
Negative Logits
atar
-0.84
orea
-0.72
ancial
-0.71
stood
-0.70
eport
-0.68
raviolet
-0.66
enza
-0.66
ammy
-0.66
ocre
-0.65
enhagen
-0.65
POSITIVE LOGITS
bows
1.04
bow
0.95
ful
0.89
stals
0.81
fully
0.80
bian
0.80
stained
0.78
tears
0.73
iffs
0.73
stal
0.72
Activations Density 0.016%