INDEX
Explanations
the word "observers" with high activation values
mentions of observers or observational roles
New Auto-Interp
Negative Logits
street
-0.67
eways
-0.67
Hom
-0.67
False
-0.66
Enough
-0.63
rax
-0.62
Bio
-0.62
sis
-0.62
Customer
-0.62
eating
-0.61
POSITIVE LOGITS
observers
1.28
observer
1.22
wat
1.03
acers
0.92
observing
0.87
auts
0.82
opol
0.81
onlook
0.79
"$:/
0.78
rejoice
0.78
Activations Density 0.013%