INDEX
Explanations
phrases related to personal experiences and opinions
New Auto-Interp
Negative Logits
topic
-0.78
metic
-0.73
limb
-0.72
mathemat
-0.72
dossier
-0.71
detail
-0.70
territorial
-0.69
stake
-0.68
vulner
-0.68
contested
-0.68
POSITIVE LOGITS
ï¸ı
1.22
ï¸
0.96
there
0.95
RIP
0.89
ãĥĥãĥī
0.87
âĹ¼
0.87
nob
0.86
sure
0.85
everyone
0.85
hey
0.84
Activations Density 0.030%