INDEX
Explanations
dates or timestamps
instances of the word "time."
New Auto-Interp
Negative Logits
heses
-0.83
atform
-0.76
istani
-0.73
hetically
-0.71
iege
-0.71
acial
-0.70
qqa
-0.70
hips
-0.69
roma
-0.68
haar
-0.67
POSITIVE LOGITS
ographed
0.87
Matters
0.72
Stamp
0.67
ously
0.65
Enforcement
0.64
mons
0.64
Garden
0.64
ograph
0.63
lda
0.63
Fav
0.62
Activations Density 0.009%