INDEX
Explanations
sentences ending with a specific format of the word 's'
New Auto-Interp
Negative Logits
Tours
-0.80
Sev
-0.69
RN
-0.69
Salon
-0.67
Sources
-0.65
Moff
-0.61
Alexandria
-0.61
Marshal
-0.60
Talks
-0.60
Mobil
-0.60
POSITIVE LOGITS
uddenly
1.23
lightly
1.14
pecially
1.13
omew
1.10
ELF
1.04
ometimes
1.00
ustainable
1.00
atisf
0.94
outhern
0.94
leeve
0.90
Activations Density 0.140%