INDEX
Explanations
the word "without" in various contexts related to absence or lack
New Auto-Interp
Negative Logits
egan
-0.15
álo
-0.15
het
-0.14
atching
-0.14
ilden
-0.14
antar
-0.14
hiba
-0.14
-tm
-0.14
auen
-0.13
rent
-0.13
POSITIVE LOGITS
regard
0.23
stood
0.20
necessarily
0.19
being
0.18
för
0.17
abox
0.17
prejudice
0.17
knowing
0.17
/by
0.17
/out
0.17
Activations Density 0.038%