INDEX
Explanations
references to a specific entity named "Stone" at different significance levels
the word "Stone" in various contexts
New Auto-Interp
Negative Logits
olulu
-0.86
merce
-0.83
oresc
-0.81
ornia
-0.79
uates
-0.77
ITAL
-0.76
ersive
-0.75
orescence
-0.73
unct
-0.72
unal
-0.71
POSITIVE LOGITS
hill
0.99
Stone
0.92
works
0.90
Stone
0.89
falls
0.89
breaker
0.88
hook
0.87
fish
0.86
house
0.85
bilt
0.84
Activations Density 0.006%