INDEX
Explanations
symbols or punctuations used for emphasis or separation between thoughts
the presence of a specific symbol or character sequence
New Auto-Interp
Negative Logits
scatter
-0.72
decomp
-0.71
mixed
-0.68
exhausted
-0.68
unmarked
-0.65
etsy
-0.63
romy
-0.62
oldest
-0.62
memoir
-0.61
fid
-0.61
POSITIVE LOGITS
âĹ¼
1.01
¬
0.94
º
0.93
âĢł
0.89
Ĵ
0.89
§
0.89
į
0.88
âĢķ
0.86
¯
0.86
ij
0.85
Activations Density 0.380%