INDEX
Explanations
specific phrases related to events like news headlines or urgent messages
topics related to emergencies and significant societal issues
New Auto-Interp
Negative Logits
ranch
-0.68
é¾į
-0.61
tremend
-0.60
princ
-0.59
redirected
-0.58
aph
-0.57
rats
-0.57
channelAvailability
-0.56
flyers
-0.56
legends
-0.56
POSITIVE LOGITS
02
1.73
01
1.67
03
1.66
04
1.60
00
1.57
05
1.55
06
1.50
07
1.41
08
1.39
09
1.31
Activations Density 0.026%