INDEX
Explanations
Russian verbs in their infinitive form
specific characters or symbols from a non-English language
New Auto-Interp
Negative Logits
GT
-0.74
clock
-0.72
waist
-0.69
Riley
-0.67
Temple
-0.64
Giles
-0.63
gag
-0.63
Clover
-0.61
shrine
-0.61
sensitivity
-0.61
POSITIVE LOGITS
в
4.72
м
2.42
к
2.35
д
2.34
н
2.02
л
2.01
ÑĤ
1.90
ÑĢ
1.89
Ð
1.83
ÑĮ
1.68
Activations Density 0.011%