INDEX
Explanations
the presence of a special character or marker indicating a new section or theme
references to specific locations or entities named "White"
New Auto-Interp
Negative Logits
cffffcc
-0.92
ategory
-0.78
udeau
-0.77
ANS
-0.76
ITAL
-0.75
Downloadha
-0.75
awaru
-0.73
APH
-0.72
yrinth
-0.72
REM
-0.72
POSITIVE LOGITS
caps
1.19
horse
1.11
supremacist
1.06
Sox
1.05
hall
1.03
house
1.00
beard
1.00
supremacists
0.97
bread
0.96
head
0.92
Activations Density 0.030%