INDEX
Explanations
references to Time Magazine
New Auto-Interp
Negative Logits
dyl
-0.77
chant
-0.71
rosis
-0.71
qqa
-0.69
rahim
-0.69
ounter
-0.68
warts
-0.67
rative
-0.64
rection
-0.63
riks
-0.63
POSITIVE LOGITS
Warner
1.05
Magazine
0.97
magazine
0.84
glass
0.84
frame
0.82
Zone
0.80
zone
0.79
frames
0.78
Cube
0.76
borne
0.74
Activations Density 0.024%