INDEX
Explanations
phrases containing the word "comrade"
mentions of a specific character or figure, particularly indicating admiration or respect
New Auto-Interp
Negative Logits
strict
-0.73
straight
-0.66
pant
-0.62
Birds
-0.62
lit
-0.62
Ep
-0.62
per
-0.62
overhead
-0.61
ups
-0.61
tri
-0.60
POSITIVE LOGITS
rade
5.15
rad
1.26
merce
1.26
rase
1.11
rand
1.09
rador
1.08
rus
1.07
opian
1.05
rious
1.01
uin
0.99
Activations Density 0.015%