INDEX
Explanations
references to abstract concepts and moral themes, particularly related to charity and ethics
New Auto-Interp
Negative Logits
Kane
-0.30
Wade
-0.29
rite
-0.28
inge
-0.28
lane
-0.28
oke
-0.28
italiane
-0.28
lete
-0.28
pose
-0.27
pute
-0.27
POSITIVE LOGITS
lovak
0.14
esidir
0.13
CLUD
0.13
anism
0.13
acidad
0.13
zon
0.13
ison
0.13
anium
0.13
entin
0.12
Verfüg
0.12
Activations Density 0.523%