INDEX
Explanations
references to academic or scientific citations and authors
New Auto-Interp
Negative Logits
ieri
-0.18
agne
-0.16
ortal
-0.15
ensen
-0.15
undo
-0.14
ensch
-0.14
late
-0.14
prite
-0.14
Traits
-0.14
ÏĦικα
-0.13
POSITIVE LOGITS
angelo
0.17
uegos
0.14
merc
0.14
ingers
0.14
bjerg
0.14
ABI
0.14
unami
0.14
avs
0.14
ekli
0.14
ingu
0.14
Activations Density 0.043%