INDEX
Explanations
specific instructions or configurations in text
occurrences of the word "this."
New Auto-Interp
Negative Logits
lev
-0.74
ometown
-0.68
isms
-0.66
eteenth
-0.65
iberal
-0.65
Izan
-0.63
eming
-0.63
uther
-0.62
borne
-0.61
letters
-0.60
POSITIVE LOGITS
wiki
0.93
latter
0.86
particular
0.79
diagram
0.79
topic
0.79
addon
0.77
webcam
0.76
endpoint
0.76
week
0.75
site
0.75
Activations Density 0.205%