INDEX
Explanations
references to editorial roles in written content
mentions of the word "editor."
New Auto-Interp
Negative Logits
bley
-0.77
pered
-0.72
achi
-0.72
enegger
-0.67
ilitary
-0.67
ptoms
-0.66
ffff
-0.66
00200000
-0.62
Fighters
-0.62
rises
-0.62
POSITIVE LOGITS
ials
1.28
ially
1.13
ettings
0.81
editor
0.80
furt
0.80
iate
0.78
rors
0.74
Picks
0.74
hips
0.72
ror
0.72
Activations Density 0.025%