INDEX
Explanations
words that indicate alternatives or choices
New Auto-Interp
Negative Logits
Griffith
-0.69
Wiseman
-0.66
unay
-0.59
enheim
-0.58
Uli
-0.57
Vivian
-0.56
bang
-0.54
rishnan
-0.54
Ƚ
-0.54
highlights
-0.54
POSITIVE LOGITS
Either
2.10
Either
2.08
either
2.05
either
2.00
ITHER
1.69
entweder
1.65
ither
1.26
enten
1.26
weder
1.23
要么
1.08
Activations Density 0.064%