INDEX
Explanations
mentions of the word "attacks" with a high activation value
mentions of "cats."
New Auto-Interp
Negative Logits
SOURCE
-0.77
pall
-0.75
SPONSORED
-0.66
Quarterly
-0.65
divest
-0.64
commencement
-0.62
curv
-0.62
planetary
-0.62
©¶æ
-0.61
concurrent
-0.59
POSITIVE LOGITS
ats
1.35
wana
1.07
terness
1.03
icket
1.03
abase
1.02
herer
0.96
heet
0.96
acus
0.93
htaking
0.92
chers
0.92
Activations Density 0.012%