INDEX
Explanations
mentions of famous individuals or specific topics in various fields, potentially related to current events
proper nouns and specific entities, often in the context of questions or discussions about them
New Auto-Interp
Negative Logits
ggles
-0.89
details
-0.77
edIn
-0.72
çīĪ
-0.69
roups
-0.68
çļ
-0.68
irts
-0.67
":"","
-0.67
ook
-0.66
udes
-0.65
POSITIVE LOGITS
supposed
1.27
gonna
1.06
worth
1.01
able
1.00
really
1.00
contagious
0.98
ready
0.97
going
0.96
aware
0.95
REALLY
0.94
Activations Density 0.114%