INDEX
Explanations
dates formatted as month followed by year
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.83
ãĤ¦ãĤ¹
-0.75
alo
-0.72
arily
-0.72
arah
-0.69
SourceFile
-0.67
Otherwise
-0.65
ãĥ¥
-0.63
Widget
-0.61
Done
-0.60
POSITIVE LOGITS
however
1.08
meanwhile
0.80
when
0.78
when
0.77
moreover
0.73
though
0.68
according
0.67
researchers
0.66
tensions
0.65
we
0.64
Activations Density 0.734%