INDEX
Explanations
references to physical attacks or conflicts
references to mugging incidents and associated discussions
New Auto-Interp
Negative Logits
cells
-0.75
cel
-0.75
ed
-0.75
ters
-0.74
coe
-0.73
nd
-0.72
ducers
-0.72
ek
-0.71
er
-0.69
seed
-0.69
POSITIVE LOGITS
ifully
0.90
iless
0.80
iful
0.78
Hots
0.77
enance
0.74
istics
0.70
lust
0.67
llan
0.65
ãĢij
0.65
ainment
0.65
Activations Density 0.158%