INDEX
Explanations
words related to current events and news articles
New Auto-Interp
Negative Logits
gdala
-0.82
vironment
-0.79
pheus
-0.79
uably
-0.77
ortium
-0.75
izon
-0.74
eryl
-0.74
liga
-0.74
izons
-0.71
oral
-0.70
POSITIVE LOGITS
paced
1.48
idious
1.27
pace
1.06
turnaround
1.02
speeds
0.97
running
0.94
lear
0.90
speed
0.87
pacing
0.87
ened
0.85
Activations Density 0.767%