INDEX
Explanations
phrases related to depth or in-depth analysis
references to "deeper" insights or understandings
New Auto-Interp
Negative Logits
advertising
-0.77
ery
-0.74
EED
-0.73
guard
-0.73
WATCHED
-0.73
gmail
-0.71
eting
-0.71
Guard
-0.69
Counter
-0.68
overrun
-0.67
POSITIVE LOGITS
depths
0.97
than
0.94
depth
0.91
insight
0.90
layers
0.88
deeper
0.84
Than
0.83
insights
0.82
dug
0.81
vein
0.80
Activations Density 0.009%