INDEX
Explanations
dates expressed as the year
instances of time-related phrases, especially those indicating a specific point in the past
New Auto-Interp
Negative Logits
¬¼
-0.69
eries
-0.68
tro
-0.67
atio
-0.66
irc
-0.66
alogy
-0.65
Generic
-0.60
asma
-0.59
natureconservancy
-0.59
roid
-0.59
POSITIVE LOGITS
according
0.95
although
0.91
culminating
0.89
prompting
0.85
huh
0.82
marking
0.81
leaving
0.80
when
0.79
namely
0.79
albeit
0.78
Activations Density 0.314%