INDEX
Explanations
mentions of the planet Earth
references to the Earth
New Auto-Interp
Negative Logits
ussen
-0.84
acca
-0.76
ublic
-0.70
weeney
-0.66
interstitial
-0.66
hett
-0.65
enges
-0.64
UGE
-0.63
acco
-0.61
unta
-0.61
POSITIVE LOGITS
worm
0.92
leigh
0.91
lings
0.91
worms
0.90
Orbit
0.89
sea
0.88
works
0.83
Bound
0.82
ffen
0.82
shattering
0.81
Activations Density 0.012%