INDEX
Explanations
phrases related to news events or actions
New Auto-Interp
Negative Logits
bene
-0.65
brim
-0.64
omission
-0.60
Detected
-0.58
absence
-0.55
silhou
-0.55
lication
-0.55
Glob
-0.54
inval
-0.53
annex
-0.53
POSITIVE LOGITS
touch
0.96
trouble
0.82
Touch
0.79
retty
0.76
touch
0.72
itialized
0.72
grips
0.71
way
0.70
offensive
0.70
ked
0.69
Activations Density 0.115%