INDEX
Explanations
references to specific individuals, particularly names
New Auto-Interp
Negative Logits
ãĥ¼ãĥĹ
-0.15
erro
-0.14
unner
-0.14
-grow
-0.14
ality
-0.14
eru
-0.14
erre
-0.13
andas
-0.13
aktion
-0.13
getter
-0.13
POSITIVE LOGITS
son
0.17
uddy
0.14
asel
0.14
mata
0.13
chr
0.13
μÎŃ
0.13
sonian
0.13
SON
0.13
roman
0.13
manship
0.12
Activations Density 0.506%