INDEX
Explanations
words related to grilling or causing harm/violence
various forms of the verb "kill" and related terms
New Auto-Interp
Negative Logits
critically
-0.68
IEEE
-0.65
Pros
-0.64
peer
-0.63
trop
-0.62
chrome
-0.61
Bet
-0.60
PRES
-0.59
Barn
-0.58
BOOK
-0.58
POSITIVE LOGITS
illing
1.19
espie
0.97
illed
0.94
sburg
0.89
iatus
0.86
horizont
0.85
ills
0.80
ustration
0.80
imentary
0.79
iard
0.78
Activations Density 0.004%