INDEX
Explanations
personal names in news articles
proper nouns, particularly names and organizations, likely associated with news events
New Auto-Interp
Negative Logits
reins
-0.74
forwards
-0.63
recons
-0.62
psychiat
-0.60
decomp
-0.59
sands
-0.58
viz
-0.58
compuls
-0.57
tending
-0.56
anew
-0.55
POSITIVE LOGITS
photo
0.80
Images
0.73
IMAGES
0.71
SPONSORED
0.71
RTX
0.69
ffic
0.68
Generic
0.67
<|endoftext|>
0.65
Distribution
0.64
uers
0.63
Activations Density 0.102%