INDEX
Explanations
numerical values formatted as text in a structured presentation, such as references or lists in a document
New Auto-Interp
Negative Logits
millenn
-0.86
anooga
-0.78
querque
-0.73
ionics
-0.68
contests
-0.67
passionate
-0.66
masc
-0.66
tragedies
-0.66
histories
-0.65
mutual
-0.65
POSITIVE LOGITS
806
1.17
608
1.14
708
1.13
504
1.13
641
1.13
758
1.12
70
1.11
756
1.11
807
1.11
688
1.10
Activations Density 0.378%