INDEX
Explanations
full names or proper nouns, possibly related to news or events occurring in a specific context
occurrences of a specific symbol or punctuation mark
New Auto-Interp
Negative Logits
obser
-0.91
conception
-0.76
awaru
-0.75
puff
-0.75
imagination
-0.74
ende
-0.73
imperson
-0.72
downed
-0.72
unconscious
-0.71
halluc
-0.70
POSITIVE LOGITS
¯
1.03
ï¸ı
0.85
âĢł
0.85
said
0.84
tab
0.81
tra
0.81
tre
0.81
£
0.79
âĪ
0.77
°
0.77
Activations Density 0.269%