INDEX
Explanations
references to specific days or dates
phrases that refer to significant days or events
New Auto-Interp
Negative Logits
emort
-0.83
negie
-0.78
itars
-0.78
icide
-0.76
emale
-0.73
icides
-0.72
bourg
-0.71
umbn
-0.70
cientious
-0.69
icultural
-0.69
POSITIVE LOGITS
dream
1.47
care
0.94
light
0.93
lights
0.87
trip
0.84
break
0.83
flower
0.83
TON
0.81
lighting
0.79
walker
0.77
Activations Density 0.064%