INDEX
Explanations
the phrase "ain't" and its variations, indicating a focus on informal or colloquial language
New Auto-Interp
Negative Logits
ге
-0.17
erable
-0.15
utsch
-0.15
reu
-0.15
hausen
-0.15
idge
-0.15
laces
-0.15
主
-0.14
ñas
-0.14
enr
-0.14
POSITIVE LOGITS
't
0.18
’t
0.16
Soph
0.16
agle
0.15
tright
0.15
ult
0.15
ixo
0.15
dio
0.14
ÑĢовиÑĩ
0.14
éli
0.14
Activations Density 0.008%