INDEX
Explanations
words related to political or controversial topics
instances of the character 'Ļ' in the text
New Auto-Interp
Negative Logits
sacrific
-0.76
hurd
-0.69
Tanz
-0.67
Assy
-0.67
loopholes
-0.65
Allied
-0.63
mathemat
-0.63
vulner
-0.63
handic
-0.62
demoral
-0.62
POSITIVE LOGITS
ï¸ı
1.12
shall
1.09
ï¸
0.97
s
0.97
tracks
0.94
ski
0.91
sure
0.89
ship
0.89
right
0.89
ser
0.86
Activations Density 0.251%