INDEX
Explanations
the word 'yr' with different activation levels
references to years
New Auto-Interp
Negative Logits
Ou
-0.81
Fiesta
-0.72
leeve
-0.71
Ic
-0.70
Shack
-0.68
Papua
-0.68
Bloom
-0.66
ļé
-0.66
Canal
-0.65
Villa
-0.65
POSITIVE LOGITS
rha
1.27
interstitial
0.93
rr
0.90
rh
0.90
andom
0.85
annis
0.83
umph
0.82
azines
0.82
acial
0.80
rocal
0.79
Activations Density 0.009%