INDEX
Explanations
detection
The neuron specifically activates on the word “Detection,” most prominently in the phrase “AI Detection.”
New Auto-Interp
Negative Logits
ew
-0.07
efa
-0.06
wik
-0.06
mnist
-0.06
جام
-0.06
Scots
-0.06
św
-0.06
.Bold
-0.06
word
-0.06
ert
-0.06
POSITIVE LOGITS
ніш
0.07
Init
0.07
democratic
0.06
.imageView
0.06
":"","
0.06
€↵
0.06
álně
0.06
하시
0.06
messageType
0.06
-Trump
0.06
Activations Density 0.002%