INDEX
Explanations
concepts related to classification and categorization
New Auto-Interp
Negative Logits
“He
-0.24
"He
-0.24
celui
-0.21
коÑĤоÑĢÑĭй
-0.19
μÎŃνοÏĤ
-0.18
Ñıкий
-0.18
který
-0.18
byli
-0.17
каждого
-0.17
him
-0.17
POSITIVE LOGITS
adas
0.45
ellas
0.44
elles
0.44
áticas
0.42
她们
0.37
elles
0.37
ativas
0.36
ées
0.34
nicas
0.34
ones
0.34
Activations Density 0.091%