INDEX
Explanations
assertions and references to understanding or knowledge
New Auto-Interp
Negative Logits
antaranya
-0.77
bordada
-0.73
tvguidetime
-0.71
AssemblyCulture
-0.67
voidaan
-0.67
adicionales
-0.66
restantes
-0.64
lisäksi
-0.62
supplémentaire
-0.62
aportes
-0.60
POSITIVE LOGITS
humans
0.90
people
0.83
Americans
0.79
women
0.75
everyone
0.72
Humans
0.70
modern
0.70
日本の
0.70
capitalism
0.70
everybody
0.69
Activations Density 0.527%