INDEX
Explanations
numbers and special characters mixed within text
special characters or symbols in the text
New Auto-Interp
Negative Logits
raints
-0.93
manif
-0.85
Instr
-0.84
horizont
-0.82
philos
-0.79
womb
-0.76
ngth
-0.75
tentacles
-0.74
condem
-0.73
symp
-0.73
POSITIVE LOGITS
âĶĢâĶĢ
1.07
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
1.04
ishable
1.00
ãĥ¼ãĥ
0.99
à©
0.92
cffffcc
0.90
ļ
0.87
ãĥ¼ãĥ«
0.85
à¨
0.85
ा
0.85
Activations Density 0.059%