INDEX
Explanations
punctuations and numerical values in the text
New Auto-Interp
Negative Logits
adam
-0.15
ilent
-0.15
ymph
-0.14
Touch
-0.14
ichel
-0.14
Touch
-0.13
YM
-0.13
eum
-0.13
ief
-0.13
uesta
-0.13
POSITIVE LOGITS
ãĥ§
0.14
iot
0.14
igsaw
0.14
odiac
0.13
ena
0.13
gboolean
0.13
ysi
0.13
git
0.13
Shuttle
0.13
ards
0.13
Activations Density 0.041%