INDEX
Explanations
temporal references, such as specific years or time periods
references to temporal concepts and personal experiences
New Auto-Interp
Negative Logits
issors
-0.79
=-=-=-=-=-=-=-=-
-0.74
nce
-0.73
omy
-0.71
LOD
-0.71
regate
-0.69
Volcano
-0.69
imation
-0.69
acly
-0.69
ancial
-0.68
POSITIVE LOGITS
unwilling
1.65
unaware
1.58
addicted
1.58
accustomed
1.58
afraid
1.56
dissatisfied
1.55
aware
1.55
unable
1.54
ashamed
1.54
fearful
1.52
Activations Density 0.565%