INDEX
Explanations
concepts related to equality and unity
New Auto-Interp
Negative Logits
orts
-0.19
yal
-0.17
unga
-0.17
ummer
-0.16
yle
-0.15
apur
-0.15
und
-0.15
anken
-0.14
rollo
-0.14
rol
-0.14
POSITIVE LOGITS
Pig
0.16
adera
0.16
egov
0.16
haus
0.15
infeld
0.15
Seah
0.15
Pot
0.15
éģº
0.15
ذا
0.15
isman
0.14
Activations Density 0.015%