INDEX
Explanations
references to the word "Stone"
the frequent mentions of "Stone"
New Auto-Interp
Negative Logits
olulu
-0.97
merce
-0.88
ntil
-0.79
unct
-0.77
oresc
-0.77
uates
-0.76
ornia
-0.75
uate
-0.74
unal
-0.73
unrestricted
-0.72
POSITIVE LOGITS
Stone
1.06
hill
1.02
Stone
0.97
breaker
0.95
bats
0.88
hook
0.88
fish
0.87
falls
0.86
rock
0.86
works
0.83
Activations Density 0.006%