INDEX
Explanations
phrases related to depth or intensity
instances of the word "deep"
New Auto-Interp
Negative Logits
perty
-0.71
cules
-0.68
annon
-0.66
uthor
-0.65
jee
-0.65
roma
-0.65
icans
-0.64
ATT
-0.63
hots
-0.63
alon
-0.62
POSITIVE LOGITS
ened
1.03
vein
0.94
dive
0.90
breaths
0.90
pockets
0.87
dives
0.87
penetration
0.85
thro
0.84
ening
0.82
fry
0.82
Activations Density 0.034%