INDEX
Explanations
phrases indicating intent or purpose related to criminal or harmful actions
New Auto-Interp
Negative Logits
visor
-0.78
Chocobo
-0.72
visors
-0.69
Cham
-0.67
Jenner
-0.67
Serge
-0.66
Jub
-0.66
Britann
-0.65
Tycoon
-0.65
Jackets
-0.64
POSITIVE LOGITS
ful
0.94
ensity
0.89
ality
0.86
fulness
0.86
intent
0.86
uality
0.82
edly
0.80
ually
0.80
eous
0.79
ruction
0.78
Activations Density 0.009%