INDEX
Explanations
mentions of disapproval or criticism
a specific character or symbol in the text
New Auto-Interp
Negative Logits
deed
-0.69
tabl
-0.69
Norn
-0.68
apes
-0.68
Slug
-0.67
prefrontal
-0.65
telesc
-0.64
pleasures
-0.64
condem
-0.63
Directorate
-0.62
POSITIVE LOGITS
ï¸ı
1.09
âĶĢâĶĢ
0.95
ternity
0.92
lean
0.92
ever
0.91
âĸł
0.87
\-
0.85
conom
0.84
··
0.83
very
0.83
Activations Density 0.200%