INDEX
Explanations
mentions of academic subjects or concepts, particularly mathematics and science
New Auto-Interp
Negative Logits
Sov
-0.79
ktop
-0.70
lease
-0.65
views
-0.65
Dialogue
-0.61
hold
-0.60
Marcos
-0.59
flesh
-0.59
Border
-0.58
Luxem
-0.58
POSITIVE LOGITS
matical
1.18
hematic
1.12
hemat
1.02
ilda
1.01
ieu
1.01
ilde
0.98
ias
0.92
equations
0.90
ians
0.86
gebra
0.86
Activations Density 0.027%