INDEX
Explanations
instances of the word "t."
New Auto-Interp
Negative Logits
cination
-0.55
persons
-0.50
այ
-0.49
м
-0.49
"
-0.49
'
-0.46
☉
-0.46
Persons
-0.46
ONLY
-0.46
only
-0.45
POSITIVE LOGITS
’)
1.42
’.
1.35
’).
1.34
’,
1.33
’:
1.25
’?
1.24
’”
1.22
’;
1.21
)’
1.19
”),
1.18
Activations Density 0.092%