INDEX
Explanations
instances of the word "and" followed by other words, particularly when multiple occurrences are close together with high activation values
connections and conjunctions in sentences
New Auto-Interp
Negative Logits
advertisement
-0.74
arro
-0.69
edia
-0.67
tesy
-0.67
prosecut
-0.66
ONSORED
-0.63
coat
-0.63
hower
-0.63
theless
-0.62
bluff
-0.62
POSITIVE LOGITS
valleys
0.97
uries
0.95
oranges
0.93
Soviets
0.80
necks
0.77
territories
0.75
Territories
0.75
Earthqu
0.73
Titans
0.72
izons
0.72
Activations Density 0.271%