INDEX
Explanations
keywords related to historical events or political topics
instances of empty or filler text
New Auto-Interp
Negative Logits
raints
-0.76
matic
-0.71
urated
-0.68
Instr
-0.68
condem
-0.65
monop
-0.65
ciating
-0.63
enegger
-0.62
primates
-0.60
apes
-0.59
POSITIVE LOGITS
âĶĢâĶĢ
1.13
ï¸ı
1.06
uthor
0.86
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.83
ĺ
0.82
×Ķ
0.82
âĢł
0.82
ļ
0.80
fter
0.79
âĸł
0.78
Activations Density 0.201%