INDEX
Explanations
TV show reviews
The neuron detects named entities (especially capitalized proper names and TV‐show titles).
New Auto-Interp
Negative Logits
xmin
-0.07
param
-0.07
Carly
-0.07
vtk
-0.06
ัต
-0.06
preschool
-0.06
Vietnam
-0.06
.broadcast
-0.06
stud
-0.06
pok
-0.06
POSITIVE LOGITS
offending
0.07
\"]
0.07
ContentAlignment
0.06
yms
0.06
countered
0.06
taxing
0.06
prisoner
0.06
ениями
0.06
9
0.06
}\"
0.06
Activations Density 0.022%