INDEX
Explanations
phrases indicating a strong opinion or directive
the recurring use of a specific symbol or character in various contexts
New Auto-Interp
Negative Logits
decomp
-0.81
hob
-0.69
Codec
-0.68
unmarked
-0.67
aid
-0.66
shack
-0.66
parap
-0.66
gib
-0.65
filming
-0.65
physi
-0.63
POSITIVE LOGITS
¢
0.97
agree
0.95
¬
0.94
ı
0.93
elong
0.93
£
0.90
erest
0.89
º
0.87
¯
0.86
Ĵ
0.85
Activations Density 0.396%