INDEX
Explanations
financial and numerical data presented in sentences
references to specific years and numerical data related to events
New Auto-Interp
Negative Logits
edly
-0.88
ens
-0.77
emo
-0.69
THING
-0.67
issance
-0.66
ella
-0.64
fuck
-0.61
rang
-0.60
stuff
-0.60
Factor
-0.60
POSITIVE LOGITS
versions
1.04
conjunction
1.04
favor
1.03
accordance
1.02
verts
0.99
increments
0.98
total
0.98
spite
0.93
animate
0.92
comparison
0.91
Activations Density 0.164%