INDEX
Explanations
text surrounded by special characters in a repetitive pattern
titles or phrases related to expressions of dislike or hate
New Auto-Interp
Negative Logits
hement
-0.69
nesday
-0.61
favoured
-0.59
honoured
-0.58
destro
-0.56
scrap
-0.51
uggest
-0.50
diseng
-0.49
favour
-0.48
ĸļ
-0.48
POSITIVE LOGITS
Associated
0.71
ccording
0.70
................................................................
0.68
================================================================
0.68
................................
0.66
--------------------------------------------------------
0.65
--------------------------------
0.64
++++++++++++++++
0.64
OVER
0.63
Pand
0.63
Activations Density 0.560%