INDEX
Explanations
This neuron detects words describing cancellation or neutralization of impulses or forces.
New Auto-Interp
Negative Logits
了解
-0.06
"'↵
-0.06
ICMP
-0.06
personals
-0.06
текст
-0.06
litres
-0.06
births
-0.06
nevy
-0.06
ados
-0.06
pistols
-0.06
POSITIVE LOGITS
.ll
0.07
trimmed
0.07
SOCIAL
0.07
Called
0.06
Committee
0.06
FileManager
0.06
ModelState
0.06
atıcı
0.06
.С
0.06
.sy
0.06
Activations Density 0.007%