INDEX
Explanations
the word "Patch" with a strong preference for occurrences where it has a high activation value
terms related to software patches and updates
New Auto-Interp
Negative Logits
obser
-0.75
¢
-0.69
Galile
-0.66
minist
-0.66
ministry
-0.65
trave
-0.64
contrad
-0.64
VERTISEMENT
-0.64
temptation
-0.64
FANT
-0.63
POSITIVE LOGITS
work
1.01
ioned
1.00
ions
0.92
notes
0.90
Patch
0.82
patch
0.82
ion
0.78
patched
0.78
ION
0.76
hern
0.76
Activations Density 0.024%