INDEX
Explanations
phrases related to quotations or reported speech
mentions of a specific character or element, represented by the unique activation pattern
New Auto-Interp
Negative Logits
minim
-0.78
OTOS
-0.70
scatter
-0.68
coffin
-0.67
decomp
-0.67
cyan
-0.67
coast
-0.66
fairy
-0.66
scene
-0.66
protective
-0.66
POSITIVE LOGITS
cause
0.91
then
0.89
said
0.88
according
0.88
since
0.85
especially
0.84
¯
0.83
_>
0.79
sure
0.77
yet
0.77
Activations Density 0.181%