INDEX
Explanations
mentions of the word "rock"
references to the term "rock" or its variations in various contexts
New Auto-Interp
Negative Logits
URES
-0.74
URE
-0.65
ples
-0.63
urers
-0.63
unctions
-0.60
ries
-0.59
Breach
-0.59
ienced
-0.57
xus
-0.57
aver
-0.57
POSITIVE LOGITS
castle
1.03
ete
1.00
star
0.98
stead
0.96
stars
0.94
enf
0.94
papers
0.91
etry
0.90
cliffe
0.90
ford
0.90
Activations Density 0.042%