INDEX
Explanations
terms related to fundamental concepts or principles
New Auto-Interp
Negative Logits
bie
-0.15
Morg
-0.14
spiel
-0.14
Ãły
-0.14
bac
-0.14
icans
-0.14
ött
-0.14
edom
-0.14
ican
-0.14
irm
-0.14
POSITIVE LOGITS
mente
0.20
flaw
0.17
importance
0.17
shift
0.17
ist
0.16
differences
0.16
ists
0.16
difference
0.16
ism
0.16
arily
0.15
Activations Density 0.028%