INDEX
Explanations
German words indicating actions and descriptions, particularly those related to human behavior and circumstances
New Auto-Interp
Negative Logits
ettel
-0.20
uite
-0.16
Richardson
-0.15
azer
-0.15
abus
-0.15
езд
-0.14
jd
-0.14
icher
-0.14
lass
-0.14
dna
-0.14
POSITIVE LOGITS
iert
0.26
agt
0.25
elt
0.25
gt
0.25
аеÑĤ
0.24
igt
0.24
ibt
0.23
ÑĢÑĥеÑĤ
0.23
ÑĥÑĶ
0.23
ÑĭваеÑĤ
0.23
Activations Density 0.034%