INDEX
Explanations
phrases emphasizing repetition or consistency
New Auto-Interp
Negative Logits
//
-0.64
Bauer
-0.61
zt
-0.60
йом
-0.57
Cone
-0.56
هاند
-0.56
駒
-0.55
时候
-0.55
Magdalene
-0.55
ässä
-0.54
POSITIVE LOGITS
every
1.69
every
1.64
EVERY
1.64
EVERY
1.63
Every
1.55
Every
1.52
Ogni
1.24
Jedes
1.15
Jede
1.10
Elke
1.10
Activations Density 0.106%