INDEX
Explanations
numerical values and their formatting
New Auto-Interp
Negative Logits
er
-0.66
sc
-0.58
ीय
-0.57
Jack
-0.57
bas
-0.55
-
-0.55
ों
-0.54
bu
-0.51
Ger
-0.51
ьев
-0.51
POSITIVE LOGITS
chofe
1.11
Anſ
1.11
Houſe
1.11
paravant
1.04
cauſe
1.03
againſt
1.03
uſed
1.01
laſt
1.01
Eſ
1.01
uſe
1.01
Activations Density 0.128%