INDEX
Explanations
phrases indicating well-established or recognized concepts
New Auto-Interp
Negative Logits
Frey
-0.71
גרת
-0.69
Frey
-0.63
ുടെ
-0.62
Julie
-0.62
yczą
-0.60
Cowper
-0.60
תוך
-0.59
Matti
-0.58
EI
-0.58
POSITIVE LOGITS
known
2.22
Known
2.07
KNOWN
2.02
known
2.00
Known
1.98
KNOWN
1.81
connue
1.41
conocido
1.36
conocida
1.35
connu
1.31
Activations Density 0.091%