INDEX
Explanations
years and time periods
the presence of commas and conjunctions
New Auto-Interp
Negative Logits
olo
-0.75
izen
-0.74
iously
-0.74
enary
-0.71
ries
-0.68
ivated
-0.67
eworthy
-0.66
obos
-0.66
uce
-0.65
iev
-0.65
POSITIVE LOGITS
albeit
0.96
meaning
0.96
huh
0.94
whereas
0.90
hence
0.86
but
0.83
although
0.79
eh
0.72
insofar
0.70
therefore
0.70
Activations Density 0.441%