INDEX
Explanations
information about specific topics or entities as described in the "\|endoftext\|" sections
occurrences of the word "About."
New Auto-Interp
Negative Logits
²¾
-0.72
itely
-0.69
itiz
-0.66
oly
-0.65
rift
-0.65
alian
-0.64
eful
-0.62
cage
-0.62
efully
-0.62
tast
-0.62
POSITIVE LOGITS
About
0.94
Citation
0.74
Us
0.73
ahime
0.71
Submit
0.70
Consent
0.70
Seym
0.69
WATCHED
0.68
doms
0.68
demographics
0.67
Activations Density 0.023%