INDEX
Explanations
references to historical developments and social structures
New Auto-Interp
Negative Logits
fraî
-0.51
volonté
-0.49
couverts
-0.49
nende
-0.47
SPECIFIED
-0.47
maggiori
-0.45
immediately
-0.45
甫
-0.45
thane
-0.43
immediatamente
-0.43
POSITIVE LOGITS
become
1.16
become
1.05
became
1.02
becoming
1.01
becomes
0.96
became
0.95
Become
0.93
becomes
0.91
Become
0.89
devenu
0.89
Activations Density 0.298%