INDEX
Explanations
references to cultural or historical figures and titles
New Auto-Interp
Negative Logits
personne
-0.67
miembro
-0.53
miembros
-0.53
member
-0.52
membre
-0.52
Personne
-0.51
membro
-0.51
osoba
-0.50
Membre
-0.50
membros
-0.49
POSITIVE LOGITS
wise
0.72
posedge
0.66
Wise
0.65
Sheikh
0.64
king
0.62
Wise
0.62
prophet
0.61
sage
0.60
ScopeManager
0.60
Duke
0.58
Activations Density 0.392%