INDEX
Explanations
conjunctions and specific nouns
New Auto-Interp
Negative Logits
↵
0.91
I
0.72
avoir
0.72
O
0.70
A
0.67
M
0.65
automatically
0.64
espèce
0.64
obligated
0.64
eternal
0.63
POSITIVE LOGITS
r
0.76
с
0.61
t
0.60
p
0.60
india
0.58
0.58
staking
0.57
mil
0.56
uk
0.55
ّر
0.55
Activations Density 0.000%