INDEX
Explanations
phrases that denote the importance or role of specific factors or entities in various contexts
New Auto-Interp
Negative Logits
eryllium
-0.53
tuot
-0.50
immédi
-0.50
íticas
-0.50
fleste
-0.49
negoti
-0.49
arrière
-0.48
pronon
-0.48
cardiaque
-0.48
inars
-0.48
POSITIVE LOGITS
role
0.91
peran
0.81
Role
0.79
Role
0.76
ROLE
0.72
role
0.72
SourceChecksum
0.72
ruolo
0.70
роль
0.67
ROLE
0.63
Activations Density 0.598%