INDEX
Explanations
references to people's names, especially the repeated mention of "Ruth" and "Babe Ruth"
references to the name "Ruth."
New Auto-Interp
Negative Logits
gotten
-0.73
ctica
-0.67
agons
-0.62
olesc
-0.62
artney
-0.62
iator
-0.62
tein
-0.61
Helsinki
-0.61
opal
-0.61
akening
-0.60
POSITIVE LOGITS
anne
0.86
Ruth
0.85
lessly
0.84
uth
0.79
less
0.77
lessness
0.74
enthal
0.71
utherford
0.71
mite
0.70
anna
0.68
Activations Density 0.012%