INDEX
Explanations
non-English characters and symbols
special characters or symbols in the text
New Auto-Interp
Negative Logits
thirds
-0.72
lapt
-0.70
ciating
-0.69
wounding
-0.69
teenth
-0.67
satell
-0.64
etheless
-0.62
emouth
-0.61
womb
-0.61
primates
-0.61
POSITIVE LOGITS
ï¸ı
1.12
ORTS
0.99
rade
0.98
ï¸
0.93
ACK
0.87
acket
0.82
acked
0.82
encers
0.81
ingu
0.78
unity
0.78
Activations Density 0.024%