INDEX
Explanations
This neuron identifies occurrences of the legal term “tort.”
New Auto-Interp
Negative Logits
shameful
-0.07
بای
-0.07
, ↵
-0.06
anal
-0.06
CU
-0.06
usalem
-0.06
neměl
-0.06
/high
-0.06
�
-0.06
.listFiles
-0.06
POSITIVE LOGITS
tort
0.08
Tort
0.08
ort
0.07
hann
0.06
fortified
0.06
ایز
0.06
доход
0.06
Earth
0.06
도별
0.06
-rated
0.06
Activations Density 0.002%