INDEX
Explanations
references to specific names or entities related to news, events, and locations
names of people or entities associated with specific events or discussions
New Auto-Interp
Negative Logits
different
-0.69
²¾
-0.66
Ö¼
-0.65
otine
-0.64
»
-0.62
destro
-0.61
ecause
-0.58
Magikarp
-0.57
aspers
-0.57
ggle
-0.53
POSITIVE LOGITS
.;
1.12
;
0.93
;;;;;;;;;;;;
0.91
.:
0.87
Photograph
0.87
âĵĺ
0.86
.,
0.86
};
0.84
âĨij
0.84
<|endoftext|>
0.83
Activations Density 0.287%