INDEX
Explanations
mentions of the proper noun "Rock."
instances of the word "Rock."
New Auto-Interp
Negative Logits
URES
-0.73
destro
-0.72
sidx
-0.70
tampering
-0.70
Chandra
-0.68
aver
-0.68
BILITIES
-0.68
confir
-0.67
urers
-0.66
terday
-0.64
POSITIVE LOGITS
ledge
1.06
berry
1.01
castle
0.99
erness
0.98
ford
0.97
ingham
0.94
ete
0.93
Berry
0.93
Rock
0.91
cliffe
0.90
Activations Density 0.013%