INDEX
Explanations
references to a sports team named "Blues" with varying levels of importance or relevance
repeated mentions of the term "Blues."
New Auto-Interp
Negative Logits
lying
-0.84
ILY
-0.81
ted
-0.75
antly
-0.74
Population
-0.72
Political
-0.72
igators
-0.71
assment
-0.68
ortion
-0.67
igated
-0.66
POSITIVE LOGITS
hift
1.34
Blues
1.05
hirt
0.97
creen
0.95
blues
0.86
berries
0.83
pace
0.80
haw
0.78
hess
0.74
forth
0.71
Activations Density 0.006%