INDEX
Explanations
pretending or facades
the neuron activates on words or word pieces that denote pretending or putting on a façade (e.g. “pretense,” “finge,” “façade”).
New Auto-Interp
Negative Logits
στα
-0.06
ким
-0.06
anv
-0.06
numa
-0.06
.Res
-0.06
.Green
-0.06
-END
-0.05
utilisateur
-0.05
각각
-0.05
PageRoute
-0.05
POSITIVE LOGITS
UNC
0.08
vers
0.07
ocities
0.07
superficial
0.07
VERS
0.07
vitality
0.07
mpeg
0.07
그래
0.07
Coron
0.07
game
0.06
Activations Density 0.029%