INDEX
Explanations
words related to thoughts and responses in discussions or comments
New Auto-Interp
Negative Logits
elon
-0.16
unk
-0.15
usted
-0.14
æĹ
-0.14
ĶĦ
-0.14
eras
-0.13
oro
-0.13
uffman
-0.13
infeld
-0.13
168
-0.13
POSITIVE LOGITS
-eslint
0.16
pla
0.15
baÅŁ
0.15
borg
0.14
folk
0.14
oub
0.14
-mini
0.13
ehr
0.13
icas
0.13
kers
0.13
Activations Density 0.004%