INDEX
Explanations
name calling or name change
New Auto-Interp
Negative Logits
noticia
-0.79
potenciales
-0.79
больших
-0.78
angefangen
-0.78
angaben
-0.78
Tarea
-0.75
există
-0.75
Kenapa
-0.74
zuhause
-0.74
Kennedy
-0.71
POSITIVE LOGITS
dropping
1.04
name
1.02
tags
0.94
names
0.92
puisi
0.91
NOPQRST
0.91
dropper
0.90
tag
0.90
plates
0.88
npos
0.88
Activations Density 0.027%