INDEX
Explanations
The neuron detects occurrences of the exact phrase “same thing.”
New Auto-Interp
Negative Logits
ama
-0.06
์)
-0.06
Hola
-0.06
một
-0.06
_it
-0.06
&___
-0.06
apps
-0.06
ﺍ
-0.06
gatherings
-0.06
těla
-0.06
POSITIVE LOGITS
최저
0.07
Concrete
0.07
.vertices
0.07
Riding
0.07
.only
0.06
と思
0.06
Coefficient
0.06
:g
0.06
holders
0.06
.calculate
0.06
Activations Density 0.011%