INDEX
Explanations
references to sections, notes, and subsections within the document
New Auto-Interp
Negative Logits
Blue
-0.56
...
-0.54
homonymie
-0.54
multiple
-0.53
imia
-0.52
idere
-0.51
mar
-0.50
he
-0.49
nước
-0.49
t
-0.48
POSITIVE LOGITS
$_"
0.85
myſelf
0.84
Theſe
0.80
Monfieur
0.80
صوتيه
0.80
ainfi
0.79
Efq
0.79
الحياه
0.78
auffi
0.76
✨:
0.76
Activations Density 1.103%