INDEX
Explanations
people's names
instances of a specific character or letter in the text
New Auto-Interp
Negative Logits
disadvant
-0.76
Palestin
-0.72
mathemat
-0.71
lawy
-0.66
contrace
-0.63
misunder
-0.62
fortun
-0.62
incorpor
-0.62
Instr
-0.62
obser
-0.61
POSITIVE LOGITS
ï¸ı
1.22
ï¸
0.87
Balt
0.78
âĶĢâĶĢ
0.75
âĻ
0.74
âĹ
0.73
tre
0.69
âĸł
0.69
eric
0.68
£
0.68
Activations Density 0.334%