INDEX
Explanations
references to household items and activities
New Auto-Interp
Negative Logits
strup
-0.16
295
-0.15
rir
-0.14
icle
-0.14
ogan
-0.14
followed
-0.14
Leban
-0.14
245
-0.13
urai
-0.13
emu
-0.13
POSITIVE LOGITS
ertools
0.16
ooks
0.16
Ded
0.15
spark
0.15
ipa
0.15
ìļ©
0.15
-grade
0.15
ÑĪем
0.15
nah
0.15
ORIES
0.14
Activations Density 0.330%