INDEX
Explanations
general statements or overviews in text
New Auto-Interp
Negative Logits
aily
-0.73
him
-0.70
ËĪ
-0.70
fest
-0.66
imm
-0.64
ocaust
-0.62
aciously
-0.61
ocracy
-0.60
gio
-0.59
vg
-0.58
POSITIVE LOGITS
adays
0.98
speaking
0.89
ccording
0.80
entimes
0.77
Speaking
0.73
we
0.73
there
0.73
terday
0.73
though
0.69
commenters
0.69
Activations Density 0.131%